BookmarkSubscribeRSS Feed
raivester
Quartz | Level 8

I have a data set that has a variable that lists a set of states (that are separated by a space). The number of states listed varies by observation. Below is a simplified example:

             state_list

obs1    MA NE AK 

obs2    NE AL ME NJ

obs3    RI

obs4    NV CA UT OR

obs5    MA NV WA CO MN MI

 

What I would like to do is separate these out so that each state listed for an observation is under a separate variable (e.g., state1, state2, state3, etc), BUT I do not want to manually enter or count the maximum number of states listed within a given instance of the state_list variable. I don't know if the CALL SYMPUT routine can somehow be used? The actual data has many thousands of observations and we receive a new data set every year, so it really is not practical to manually go through and determine the length of the longest list in the state_list variable.

 

Any advice?

 

My resulting data set would look something like:

             st1  st2  st3   st4   st5  st6

obs1    MA  NE  AK 

obs2    NE  AL  ME   NJ

obs3    RI

obs4    NV  CA  UT   OR

obs5    MA  NV  WA  CO  MN  MI

3 REPLIES 3
Reeza
Super User

Split each value out to it's own row and then use PROC TRANSPOSE.

Untested demo:

data long;
set have;

nWords = countw(state_list);

do i=1 to nWords;
state = scan(state_list, i);
output;
end;
run;

proc transpose data=long out=wide prefix = State;
by OBS;
var State;
ID i;
run;

Transposing data tutorials:
Long to Wide:
https://stats.idre.ucla.edu/sas/modules/how-to-reshape-data-long-to-wide-using-proc-transpose/

https://stats.idre.ucla.edu/sas/modules/reshaping-data-long-to-wide-using-the-data-step/


@raivester wrote:

I have a data set that has a variable that lists a set of states (that are separated by a space). The number of states listed varies by observation. Below is a simplified example:

             state_list

obs1    MA NE AK 

obs2    NE AL ME NJ

obs3    RI

obs4    NV CA UT OR

obs5    MA NV WA CO MN MI

 

What I would like to do is separate these out so that each state listed for an observation is under a separate variable (e.g., state1, state2, state3, etc), BUT I do not want to manually enter or count the maximum number of states listed within a given instance of the state_list variable. I don't know if the CALL SYMPUT routine can somehow be used? The actual data has many thousands of observations and we receive a new data set every year, so it really is not practical to manually go through and determine the length of the longest list in the state_list variable.

 

Any advice?

 

My resulting data set would look something like:

             st1  st2  st3   st4   st5  st6

obs1    MA  NE  AK 

obs2    NE  AL  ME   NJ

obs3    RI

obs4    NV  CA  UT   OR

obs5    MA  NV  WA  CO  MN  MI


 

Tom
Super User Tom
Super User

Use a DO loop with SCAN() and OUTPUT to convert to a tall format dataset.

Here is a method that works even when some observations have an empty list.

data have;
  input id $ state_list $50. ;
cards;
obs1 MA NE AK
obs2 NE AL ME NJ
obs3 RI
obs4 NV CA UT OR
obs5 MA NV WA CO MN MI
obs6
;

data want ;
  set have;
  do state_no=1 by 1 until (state_no >= countw(state_list,' '));
    state=scan(state_list,state_no,' ');
    output;
  end;
run;

Now you can use that dataset for most things. You can even use PROC REPORT to print it in the wide style.

proc report data=want missing;
  column id state,state_no;
  define id / group;
  define state_no / across 'STATE';
  define state / group ' ';
run;

Results:

image.png

Or if you really want that wide report as a dataset use PROC TRANSPOSE

proc transpose data=want out=wide(drop=_name_)  prefix=st ;
  by id;
  var state;
  id state_no;
run;

Result:

image.png

 

Ksharp
Super User
data have;
  input id $ state_list $50. ;
cards;
obs1 MA NE AK
obs2 NE AL ME NJ
obs3 RI
obs4 NV CA UT OR
obs5 MA NV WA CO MN MI
obs6
;
proc sql;
select max(countw(state_list)) into : n
 from have;
 quit;
data want;
 set have;
 array st{&n} $ 40;
 do i=1 to countw(state_list);
   st{i}=scan(state_list,i);
 end;
drop i;
run;

Or if you have a BIG table.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1667 views
  • 0 likes
  • 4 in conversation