Solved: Re: Flattening/Transposing a dataset

ALEXIA1 · Posted 12-02-2021 10:48 PM

I'm am trying to take the following dataset (for example):

data test;
input ID Name $ Topic A1 A2;
datalines;
1 Alex 1 1 5
1 Alex 2 5 1
1 Bob 2 5 1
1 Smith 4 1 5
2 Greg 1 1 1
2 Greg 2 1 1
3 Joe 3 1 5
3 Joe 4 1 5
run;

And collapse it down by ID and Name so that it looks like this (the table has been cut off but you get the idea hopefully).

I am relatively new to SAS so I'm not sure what the best approach would be. I've tried transposing with no luck, as well as creating blank columns and moving the data (however this is very inefficient if you have a big dataset). Any recommendations are welcome!

Tom · Posted 12-02-2021 11:12 PM

PROC SUMMARY will do a good job.

For your example the code could look like:

proc summary data=test;
  by id name;
  output out=want(drop=_type_) idgroup( out[2] (topic a1 a2)=);
run;

Results:

Obs    ID    Name     _FREQ_    Topic_1    Topic_2    A1_1    A1_2    A2_1    A2_2

 1      1    Alex        2         1          2         1       5       5       1
 2      1    Bob         1         2          .         5       .       1       .
 3      1    Smith       1         4          .         1       .       5       .
 4      2    Greg        2         1          2         1       1       1       1
 5      3    Joe         2         3          4         1       1       5       5

I told it to create only [2] variables because I could see that the max number of rows was only two.

But you could use code to count that maximum number needed and put that into a macro variable instead.

proc sql noprint;
select max(count) into :n_reps trimmed
from (select id,name,count(*) as count
      from test
      group by id,name )
;
quit;

proc summary data=test;
  by id name;
  output out=want(drop=_type_) idgroup( out[&n_reps] (topic a1 a2)=);
run;

If the data is not sorted then use a CLASS statement instead of the BY statement and add the NWAY option.

proc summary data=test nway;
  class id name;
  output out=want(drop=_type_) idgroup( out[&n_reps] (topic a1 a2)=);
run;

View solution in original post

Tom · Posted 12-02-2021 11:12 PM

PROC SUMMARY will do a good job.

For your example the code could look like:

proc summary data=test;
  by id name;
  output out=want(drop=_type_) idgroup( out[2] (topic a1 a2)=);
run;

Results:

Obs    ID    Name     _FREQ_    Topic_1    Topic_2    A1_1    A1_2    A2_1    A2_2

 1      1    Alex        2         1          2         1       5       5       1
 2      1    Bob         1         2          .         5       .       1       .
 3      1    Smith       1         4          .         1       .       5       .
 4      2    Greg        2         1          2         1       1       1       1
 5      3    Joe         2         3          4         1       1       5       5

I told it to create only [2] variables because I could see that the max number of rows was only two.

But you could use code to count that maximum number needed and put that into a macro variable instead.

proc sql noprint;
select max(count) into :n_reps trimmed
from (select id,name,count(*) as count
      from test
      group by id,name )
;
quit;

proc summary data=test;
  by id name;
  output out=want(drop=_type_) idgroup( out[&n_reps] (topic a1 a2)=);
run;

If the data is not sorted then use a CLASS statement instead of the BY statement and add the NWAY option.

proc summary data=test nway;
  class id name;
  output out=want(drop=_type_) idgroup( out[&n_reps] (topic a1 a2)=);
run;

Reeza · Posted 12-02-2021 11:16 PM

I do not recommend doing this. This structure is typically not useful and your current form is usually the preferred.

What are you doing that you think you need that form?

Cynthia_sas · Posted 12-02-2021 11:17 PM

Hi:

Without doing any transposing, PROC REPORT can give you a report using TOPIC as an ACROSS item on the report, as shown below:

I know the structure of the report isn't exactly what you showed as the desired output. I wasn't really clear on what you needed as output -- a new data file or a report. But this is what you can get by default with your existing structure and without renaming variables. If you need to rename variables and change the structure, then I would probably choose a DATA step program. I have to admit, I don't quite understand how one ID can have 3 different name values, as shown for ID 1 -- so I suspect this is fake data and that your real data has some other relationship between ID and Name, where the multiple rows for a single ID makes sense.

Cynthia

ballardw · Posted 12-02-2021 11:18 PM

Before going on with this, what will the resulting data set be used for?

What is the maximum number times any of the Id Name combinations ever appear? By ever I mean not just in this data set but in any likely to occur in the future data sets.

Does the value of Topic ever duplicate within an Id Name combination?

Also, is the actual order of all the variable columns important? I have to ask because some possible solutions make that somewhat difficult to do and require some code that is a bit ugly.

Warning: the form that you propose is almost impossible to do most types of analysis with, graph, or probably even make nice appearing tables from.

ALEXIA1 · Posted 12-03-2021 04:28 PM

Hi ballardw, the dataset will later be merged to a larger dataset whereby one row exists for each ID/name.
The max number of times a given ID/name combination can appear in the dataset above and future datasets is 4. The value of Topic never duplicates for a ID/Name combination. And, the order of the columns is not important.

I'm expecting the code to be a big ugly but hopefully this information helps! Thanks

Ksharp · Posted 12-03-2021 07:05 AM

data test;
input ID Name $ Topic A1 A2;
datalines;
1 Alex 1 1 5
1 Alex 2 5 1
1 Bob 2 5 1
1 Smith 4 1 5
2 Greg 1 1 1
2 Greg 2 1 1
3 Joe 3 1 5
3 Joe 4 1 5
;
run;



proc sql noprint;
select distinct catt('test(where=(topic_',topic,'=',topic,') rename=(topic=topic_',topic,' a1=a1_',topic,' a2=a2_',topic,'))')
       into :merge separated by ' '
 from test;
quit;

data want;
 merge &merge.;
 by id name;
run;

proc stdize data=want out=final_want missing=0 reponly;
run;

Registration is open

SAS Training: Just a Click Away