I have a data set that has a variable that lists a set of states (that are separated by a space). The number of states listed varies by observation. Below is a simplified example:
state_list
obs1 MA NE AK
obs2 NE AL ME NJ
obs3 RI
obs4 NV CA UT OR
obs5 MA NV WA CO MN MI
What I would like to do is separate these out so that each state listed for an observation is under a separate variable (e.g., state1, state2, state3, etc), BUT I do not want to manually enter or count the maximum number of states listed within a given instance of the state_list variable. I don't know if the CALL SYMPUT routine can somehow be used? The actual data has many thousands of observations and we receive a new data set every year, so it really is not practical to manually go through and determine the length of the longest list in the state_list variable.
Any advice?
My resulting data set would look something like:
st1 st2 st3 st4 st5 st6
obs1 MA NE AK
obs2 NE AL ME NJ
obs3 RI
obs4 NV CA UT OR
obs5 MA NV WA CO MN MI
Split each value out to it's own row and then use PROC TRANSPOSE.
Untested demo:
data long;
set have;
nWords = countw(state_list);
do i=1 to nWords;
state = scan(state_list, i);
output;
end;
run;
proc transpose data=long out=wide prefix = State;
by OBS;
var State;
ID i;
run;
Transposing data tutorials:
Long to Wide:
https://stats.idre.ucla.edu/sas/modules/how-to-reshape-data-long-to-wide-using-proc-transpose/
https://stats.idre.ucla.edu/sas/modules/reshaping-data-long-to-wide-using-the-data-step/
@raivester wrote:
I have a data set that has a variable that lists a set of states (that are separated by a space). The number of states listed varies by observation. Below is a simplified example:
state_list
obs1 MA NE AK
obs2 NE AL ME NJ
obs3 RI
obs4 NV CA UT OR
obs5 MA NV WA CO MN MI
What I would like to do is separate these out so that each state listed for an observation is under a separate variable (e.g., state1, state2, state3, etc), BUT I do not want to manually enter or count the maximum number of states listed within a given instance of the state_list variable. I don't know if the CALL SYMPUT routine can somehow be used? The actual data has many thousands of observations and we receive a new data set every year, so it really is not practical to manually go through and determine the length of the longest list in the state_list variable.
Any advice?
My resulting data set would look something like:
st1 st2 st3 st4 st5 st6
obs1 MA NE AK
obs2 NE AL ME NJ
obs3 RI
obs4 NV CA UT OR
obs5 MA NV WA CO MN MI
Use a DO loop with SCAN() and OUTPUT to convert to a tall format dataset.
Here is a method that works even when some observations have an empty list.
data have;
input id $ state_list $50. ;
cards;
obs1 MA NE AK
obs2 NE AL ME NJ
obs3 RI
obs4 NV CA UT OR
obs5 MA NV WA CO MN MI
obs6
;
data want ;
set have;
do state_no=1 by 1 until (state_no >= countw(state_list,' '));
state=scan(state_list,state_no,' ');
output;
end;
run;
Now you can use that dataset for most things. You can even use PROC REPORT to print it in the wide style.
proc report data=want missing;
column id state,state_no;
define id / group;
define state_no / across 'STATE';
define state / group ' ';
run;
Results:
Or if you really want that wide report as a dataset use PROC TRANSPOSE
proc transpose data=want out=wide(drop=_name_) prefix=st ;
by id;
var state;
id state_no;
run;
Result:
data have;
input id $ state_list $50. ;
cards;
obs1 MA NE AK
obs2 NE AL ME NJ
obs3 RI
obs4 NV CA UT OR
obs5 MA NV WA CO MN MI
obs6
;
proc sql;
select max(countw(state_list)) into : n
from have;
quit;
data want;
set have;
array st{&n} $ 40;
do i=1 to countw(state_list);
st{i}=scan(state_list,i);
end;
drop i;
run;
Or if you have a BIG table.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.