BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ALEXIA1
Calcite | Level 5

I'm am trying to take the following dataset (for example):

data test;
input ID Name $ Topic A1 A2;
datalines;
1 Alex 1 1 5
1 Alex 2 5 1
1 Bob 2 5 1
1 Smith 4 1 5
2 Greg 1 1 1
2 Greg 2 1 1
3 Joe 3 1 5
3 Joe 4 1 5
run;

And collapse it  down by ID and Name so that it looks like this (the table has been cut off but you get the idea hopefully).

ALEXIA1_1-1638502246664.png

I am relatively new to SAS so I'm not sure what the best approach would be. I've tried transposing with no luck, as well as creating blank columns and moving the data (however this is very inefficient if you have a big dataset). Any recommendations are welcome!

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

PROC SUMMARY will do a good job.

 

For your example the code could look like:

 

proc summary data=test;
  by id name;
  output out=want(drop=_type_) idgroup( out[2] (topic a1 a2)=);
run;

Results:

 

 

Obs    ID    Name     _FREQ_    Topic_1    Topic_2    A1_1    A1_2    A2_1    A2_2

 1      1    Alex        2         1          2         1       5       5       1
 2      1    Bob         1         2          .         5       .       1       .
 3      1    Smith       1         4          .         1       .       5       .
 4      2    Greg        2         1          2         1       1       1       1
 5      3    Joe         2         3          4         1       1       5       5

I told it to create only [2] variables because I could see that the max number of rows was only two.

 

But you could use code to count that maximum number needed and put that into a macro variable instead.

 

proc sql noprint;
select max(count) into :n_reps trimmed
from (select id,name,count(*) as count
      from test
      group by id,name )
;
quit;

proc summary data=test;
  by id name;
  output out=want(drop=_type_) idgroup( out[&n_reps] (topic a1 a2)=);
run;

If the data is not sorted then use a CLASS statement instead of the BY statement and add the NWAY option.

 

proc summary data=test nway;
  class id name;
  output out=want(drop=_type_) idgroup( out[&n_reps] (topic a1 a2)=);
run;

View solution in original post

6 REPLIES 6
Tom
Super User Tom
Super User

PROC SUMMARY will do a good job.

 

For your example the code could look like:

 

proc summary data=test;
  by id name;
  output out=want(drop=_type_) idgroup( out[2] (topic a1 a2)=);
run;

Results:

 

 

Obs    ID    Name     _FREQ_    Topic_1    Topic_2    A1_1    A1_2    A2_1    A2_2

 1      1    Alex        2         1          2         1       5       5       1
 2      1    Bob         1         2          .         5       .       1       .
 3      1    Smith       1         4          .         1       .       5       .
 4      2    Greg        2         1          2         1       1       1       1
 5      3    Joe         2         3          4         1       1       5       5

I told it to create only [2] variables because I could see that the max number of rows was only two.

 

But you could use code to count that maximum number needed and put that into a macro variable instead.

 

proc sql noprint;
select max(count) into :n_reps trimmed
from (select id,name,count(*) as count
      from test
      group by id,name )
;
quit;

proc summary data=test;
  by id name;
  output out=want(drop=_type_) idgroup( out[&n_reps] (topic a1 a2)=);
run;

If the data is not sorted then use a CLASS statement instead of the BY statement and add the NWAY option.

 

proc summary data=test nway;
  class id name;
  output out=want(drop=_type_) idgroup( out[&n_reps] (topic a1 a2)=);
run;
Reeza
Super User

 I do not recommend doing this. This structure is typically not useful and your current form is usually the preferred. 

What are you doing that you think you need that form?

 

 

Cynthia_sas
Diamond | Level 26

Hi:

  Without doing any transposing, PROC REPORT can give you a report using TOPIC as an ACROSS item on the report, as shown below:

Cynthia_sas_0-1638504809970.png

 

  I know the structure of the report isn't exactly what you showed as the desired output. I wasn't really clear on what you needed as output -- a new data file or a report. But this is what you can get by default with your existing structure and without renaming variables. If you need to rename variables and change the structure, then I would probably choose a DATA step program. I have to admit, I don't quite understand how one ID can have 3 different name values, as shown for ID 1 -- so I suspect this is fake data and that your real data has some other relationship between ID and Name, where the multiple rows for a single ID makes sense.

Cynthia

 

ballardw
Super User

Before going on with this, what will the resulting data set be used for?

What is the maximum number times any of the Id Name combinations ever appear? By ever I mean not just in this data set but in any likely to occur in the future data sets.

Does the value of Topic ever duplicate within an Id Name combination?

Also, is the actual order of all the variable columns important? I have to ask because some possible solutions make that somewhat difficult to do and require some code that is a bit ugly.

 

 

Warning: the form that you propose is almost impossible to do most types of analysis with, graph, or probably even make nice appearing tables from.

ALEXIA1
Calcite | Level 5
Hi ballardw, the dataset will later be merged to a larger dataset whereby one row exists for each ID/name.
The max number of times a given ID/name combination can appear in the dataset above and future datasets is 4. The value of Topic never duplicates for a ID/Name combination. And, the order of the columns is not important.

I'm expecting the code to be a big ugly but hopefully this information helps! Thanks
Ksharp
Super User
data test;
input ID Name $ Topic A1 A2;
datalines;
1 Alex 1 1 5
1 Alex 2 5 1
1 Bob 2 5 1
1 Smith 4 1 5
2 Greg 1 1 1
2 Greg 2 1 1
3 Joe 3 1 5
3 Joe 4 1 5
;
run;



proc sql noprint;
select distinct catt('test(where=(topic_',topic,'=',topic,') rename=(topic=topic_',topic,' a1=a1_',topic,' a2=a2_',topic,'))')
       into :merge separated by ' '
 from test;
quit;

data want;
 merge &merge.;
 by id name;
run;

proc stdize data=want out=final_want missing=0 reponly;
run;

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 2493 views
  • 1 like
  • 6 in conversation