I have longitudinal data that looks like this:
ID Name
1 A
1 B
1 C
2 C
2 D
2 E
3 A
3 B
And I want it to look like this:
ID A B C D E
1 1 1 1 0 0
2 0 0 1 1 1
3 1 1 0 0 0
I have a lot more names and IDs than what is sampled above. Can anyone advise how I can do this? I want all the possible names as a variable basically, and the response to be 0 or 1 for each ID, if it exists or not.
one way. Assumes the starting data is sorted by your ID variable. This also assumes than any values of your Name variable are also valid as variable names. If that is not the actual case with your data then you will need to provide details and whether you are going to use name literal variables.
data have; input ID $ Name $; v=1; datalines; 1 A 1 B 1 C 2 C 2 D 2 E 3 A 3 B ; proc transpose data=have out=trans (drop=_name_); by id; var v; id name; run; data want; set trans; array __a _numeric_; do i=1 to dim(__a); __a[i] = not missing(__a[i]); end; drop i; run;
The odd __a array name is because I have no idea what actual values you have for your Name variable and am trying to avoid likely values as you can't have an array with the same name as a variable in the data set.
Thanks @ballardw ! This works with one problem - my ID is set to 1 for all observations in the final dataset. Any idea why?
@Melk wrote:
Thanks @ballardw ! This works with one problem - my ID is set to 1 for all observations in the final dataset. Any idea why?
All I can say is data related (provide actual data) or a coding issue. Without seeing your code or data it is hard to diagnose.
Of course if you simplified your starting data too much then there may be additional sorting required and more by variables for the sort and transpose.
One simple way is to use some "wallpaper" code with PROC SQL. Like this:
194 +create table want as select id 195 +,max(name="A" ) as A 196 +,max(name="B" ) as B 197 +,max(name="C" ) as C 198 +,max(name="D" ) as D 199 +,max(name="E" ) as E 200 +from have group by id order by id;
So that is easy to generate if you have a list of the possible values of NAME.
filename code temp;
data _null_;
set names end=eof;
file code;
if _n_=1 then put 'create table want as select id';
put ',max(name=' name :$quote. ') as ' name ;
if eof then put 'from have group by id order by id;' ;
run;
proc sql noprint;
%include code / source2;
quit;
If you don't have a list of the possible values you can easily get a list of the values present in the current dataset.
proc sort data=have(keep=name) out=names nodupkey;
by name;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.