I'm am trying to take the following dataset (for example):
data test;
input ID Name $ Topic A1 A2;
datalines;
1 Alex 1 1 5
1 Alex 2 5 1
1 Bob 2 5 1
1 Smith 4 1 5
2 Greg 1 1 1
2 Greg 2 1 1
3 Joe 3 1 5
3 Joe 4 1 5
run;
And collapse it down by ID and Name so that it looks like this (the table has been cut off but you get the idea hopefully).
I am relatively new to SAS so I'm not sure what the best approach would be. I've tried transposing with no luck, as well as creating blank columns and moving the data (however this is very inefficient if you have a big dataset). Any recommendations are welcome!
PROC SUMMARY will do a good job.
For your example the code could look like:
proc summary data=test;
by id name;
output out=want(drop=_type_) idgroup( out[2] (topic a1 a2)=);
run;
Results:
Obs ID Name _FREQ_ Topic_1 Topic_2 A1_1 A1_2 A2_1 A2_2 1 1 Alex 2 1 2 1 5 5 1 2 1 Bob 1 2 . 5 . 1 . 3 1 Smith 1 4 . 1 . 5 . 4 2 Greg 2 1 2 1 1 1 1 5 3 Joe 2 3 4 1 1 5 5
I told it to create only [2] variables because I could see that the max number of rows was only two.
But you could use code to count that maximum number needed and put that into a macro variable instead.
proc sql noprint;
select max(count) into :n_reps trimmed
from (select id,name,count(*) as count
from test
group by id,name )
;
quit;
proc summary data=test;
by id name;
output out=want(drop=_type_) idgroup( out[&n_reps] (topic a1 a2)=);
run;
If the data is not sorted then use a CLASS statement instead of the BY statement and add the NWAY option.
proc summary data=test nway;
class id name;
output out=want(drop=_type_) idgroup( out[&n_reps] (topic a1 a2)=);
run;
PROC SUMMARY will do a good job.
For your example the code could look like:
proc summary data=test;
by id name;
output out=want(drop=_type_) idgroup( out[2] (topic a1 a2)=);
run;
Results:
Obs ID Name _FREQ_ Topic_1 Topic_2 A1_1 A1_2 A2_1 A2_2 1 1 Alex 2 1 2 1 5 5 1 2 1 Bob 1 2 . 5 . 1 . 3 1 Smith 1 4 . 1 . 5 . 4 2 Greg 2 1 2 1 1 1 1 5 3 Joe 2 3 4 1 1 5 5
I told it to create only [2] variables because I could see that the max number of rows was only two.
But you could use code to count that maximum number needed and put that into a macro variable instead.
proc sql noprint;
select max(count) into :n_reps trimmed
from (select id,name,count(*) as count
from test
group by id,name )
;
quit;
proc summary data=test;
by id name;
output out=want(drop=_type_) idgroup( out[&n_reps] (topic a1 a2)=);
run;
If the data is not sorted then use a CLASS statement instead of the BY statement and add the NWAY option.
proc summary data=test nway;
class id name;
output out=want(drop=_type_) idgroup( out[&n_reps] (topic a1 a2)=);
run;
I do not recommend doing this. This structure is typically not useful and your current form is usually the preferred.
What are you doing that you think you need that form?
Hi:
Without doing any transposing, PROC REPORT can give you a report using TOPIC as an ACROSS item on the report, as shown below:
I know the structure of the report isn't exactly what you showed as the desired output. I wasn't really clear on what you needed as output -- a new data file or a report. But this is what you can get by default with your existing structure and without renaming variables. If you need to rename variables and change the structure, then I would probably choose a DATA step program. I have to admit, I don't quite understand how one ID can have 3 different name values, as shown for ID 1 -- so I suspect this is fake data and that your real data has some other relationship between ID and Name, where the multiple rows for a single ID makes sense.
Cynthia
Before going on with this, what will the resulting data set be used for?
What is the maximum number times any of the Id Name combinations ever appear? By ever I mean not just in this data set but in any likely to occur in the future data sets.
Does the value of Topic ever duplicate within an Id Name combination?
Also, is the actual order of all the variable columns important? I have to ask because some possible solutions make that somewhat difficult to do and require some code that is a bit ugly.
Warning: the form that you propose is almost impossible to do most types of analysis with, graph, or probably even make nice appearing tables from.
data test;
input ID Name $ Topic A1 A2;
datalines;
1 Alex 1 1 5
1 Alex 2 5 1
1 Bob 2 5 1
1 Smith 4 1 5
2 Greg 1 1 1
2 Greg 2 1 1
3 Joe 3 1 5
3 Joe 4 1 5
;
run;
proc sql noprint;
select distinct catt('test(where=(topic_',topic,'=',topic,') rename=(topic=topic_',topic,' a1=a1_',topic,' a2=a2_',topic,'))')
into :merge separated by ' '
from test;
quit;
data want;
merge &merge.;
by id name;
run;
proc stdize data=want out=final_want missing=0 reponly;
run;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.