BookmarkSubscribeRSS Feed
Melk
Lapis Lazuli | Level 10

I have longitudinal data that looks like this:

 

ID     Name

1        A

1        B

1        C

2        C

2        D

2        E

3        A

3        B

 

And I want it to look like this:

 

ID       A     B     C     D     E

1         1     1     1      0      0

2         0     0     1      1      1

3         1     1     0      0      0

 

 

I have a lot more names and IDs than what is sampled above. Can anyone advise how I can do this? I want all the possible names as a variable basically, and the response to be 0 or 1 for each ID, if it exists or not. 

5 REPLIES 5
ballardw
Super User

one way. Assumes the starting data is sorted by your ID variable. This also assumes than any values of your Name variable are also valid as variable names. If that is not the actual case with your data then you will need to provide details and whether you are going to use name literal variables.

 

data have;
 input ID $    Name $;
 v=1;
datalines;
1        A
1        B
1        C
2        C
2        D
2        E
3        A
3        B
;

proc transpose data=have out=trans (drop=_name_);
 by id;
 var v;
 id name;
run;

data want;
   set trans;
   array __a _numeric_;
   do i=1 to dim(__a);
      __a[i] = not missing(__a[i]);
   end;
   drop i;
run;

The odd __a array name is because I have no idea what actual values you have for your Name variable and am trying to avoid likely values as you can't have an array with the same name as a variable in the data set.

Melk
Lapis Lazuli | Level 10

Thanks @ballardw ! This works with one problem - my ID is set to 1 for all observations in the final dataset. Any idea why?

ballardw
Super User

@Melk wrote:

Thanks @ballardw ! This works with one problem - my ID is set to 1 for all observations in the final dataset. Any idea why?


All I can say is data related (provide actual data) or a coding issue. Without seeing your code or data it is hard to diagnose.

 

Of course if you simplified your starting data too much then there may be additional sorting required and more by variables for the sort and transpose.

Reeza
Super User
If you search "creating dummy variables" on here, in the Library (not Community) you'll find a few really good resources to automate this. Do note, that most require you to have all levels in your data. If you don't, you'll need a different way to account for that, using CLASSDATA or a format are some options.
Tom
Super User Tom
Super User

One simple way is to use some "wallpaper" code with PROC SQL.  Like this:

194  +create table want as select id
195  +,max(name="A" ) as A
196  +,max(name="B" ) as B
197  +,max(name="C" ) as C
198  +,max(name="D" ) as D
199  +,max(name="E" ) as E
200  +from have group by id order by id;

So that is easy to generate if you have a list of the possible values of NAME.

filename code temp;
data _null_;
  set names end=eof;
  file code;
  if _n_=1 then put 'create table want as select id';
  put ',max(name=' name :$quote. ') as ' name ;
  if eof then put 'from have group by id order by id;' ;
run;

proc sql noprint;
%include code / source2;
quit;

If you don't have a list of the possible values you can easily get a list of the values present in the current dataset.

proc sort data=have(keep=name) out=names nodupkey;
  by name;
run;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 1083 views
  • 2 likes
  • 4 in conversation