Re: Transposing many responses to columns with binary response

Melk · Posted 06-25-2019 03:03 PM

I have longitudinal data that looks like this:

ID Name

1 A

1 B

1 C

2 C

2 D

2 E

3 A

3 B

And I want it to look like this:

ID A B C D E

1 1 1 1 0 0

2 0 0 1 1 1

3 1 1 0 0 0

I have a lot more names and IDs than what is sampled above. Can anyone advise how I can do this? I want all the possible names as a variable basically, and the response to be 0 or 1 for each ID, if it exists or not.

ballardw · Posted 06-25-2019 03:52 PM

one way. Assumes the starting data is sorted by your ID variable. This also assumes than any values of your Name variable are also valid as variable names. If that is not the actual case with your data then you will need to provide details and whether you are going to use name literal variables.

data have;
 input ID $    Name $;
 v=1;
datalines;
1        A
1        B
1        C
2        C
2        D
2        E
3        A
3        B
;

proc transpose data=have out=trans (drop=_name_);
 by id;
 var v;
 id name;
run;

data want;
   set trans;
   array __a _numeric_;
   do i=1 to dim(__a);
      __a[i] = not missing(__a[i]);
   end;
   drop i;
run;

The odd __a array name is because I have no idea what actual values you have for your Name variable and am trying to avoid likely values as you can't have an array with the same name as a variable in the data set.

Melk · Posted 06-25-2019 05:35 PM

Thanks @ballardw ! This works with one problem - my ID is set to 1 for all observations in the final dataset. Any idea why?

ballardw · Posted 06-25-2019 07:25 PM

@Melk wrote:

Thanks @ballardw ! This works with one problem - my ID is set to 1 for all observations in the final dataset. Any idea why?

All I can say is data related (provide actual data) or a coding issue. Without seeing your code or data it is hard to diagnose.

Of course if you simplified your starting data too much then there may be additional sorting required and more by variables for the sort and transpose.

Reeza · Posted 06-25-2019 04:11 PM

If you search "creating dummy variables" on here, in the Library (not Community) you'll find a few really good resources to automate this. Do note, that most require you to have all levels in your data. If you don't, you'll need a different way to account for that, using CLASSDATA or a format are some options.

Tom · Posted 06-25-2019 07:36 PM

One simple way is to use some "wallpaper" code with PROC SQL. Like this:

194  +create table want as select id
195  +,max(name="A" ) as A
196  +,max(name="B" ) as B
197  +,max(name="C" ) as C
198  +,max(name="D" ) as D
199  +,max(name="E" ) as E
200  +from have group by id order by id;

So that is easy to generate if you have a list of the possible values of NAME.

filename code temp;
data _null_;
  set names end=eof;
  file code;
  if _n_=1 then put 'create table want as select id';
  put ',max(name=' name :$quote. ') as ' name ;
  if eof then put 'from have group by id order by id;' ;
run;

proc sql noprint;
%include code / source2;
quit;

If you don't have a list of the possible values you can easily get a list of the values present in the current dataset.

proc sort data=have(keep=name) out=names nodupkey;
  by name;
run;

Transposing many responses to columns with binary response