BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
monsterpie
Obsidian | Level 7

Hello all,

 

I have a long dataset and am using proc summary and sql to transform it to a wide dataset. I'm having trouble adapting the following code to retain all of my variables in my long dataset in the new wide dataset. Example of my code is below. For my age and count variables, I would like to retain them but if I include them in the IDGROUP statement, then SAS creates multiple age and count columns in my wide dataset (I would just like one, see desired output below).

 


data have;
input id visit_num diag$ age treatment count;
cards;
01 1 diabetes 42 1 1
02 1 stroke 78 1 2
02 2 cancer . 2 2
02 3 stroke . . 2
03 1 stroke 66 2 2
03 2 copd . 1 2
03 3 . . . 2
;
run;

proc sql noprint;
select max(n) into :n
 from (select count(*) as n from have group by id);
quit;
proc summary data=have;
by id;
output out=want idgroup(out[&n] (visit_num diag treatment)=);
run;

Desired output:(Note, in my actual dataset I have ~40 variables similar to count and age, where I would like just one column for each of these variables in my wide dataset).

ID Visit_Num1 Visit_Num2 Visit_Num3 Age Treatment1 Treatment2 Treatment3 Count
01 diabetes     42 1     1
02 stroke cancer stroke 78 1 2   2
03 stroke copd   66 2 1   2
1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

This gets you pretty close, except it's not coded as VISIT_NUM but DIAG1 - DIAG3

 

 

data have;
input id visit_num diag$ age treatment count;
cards;
01 1 diabetes 42 1 1
02 1 stroke 78 1 2
02 2 cancer . 2 2
02 3 stroke . . 2
03 1 stroke 66 2 2
03 2 copd . 1 2
03 3 . . . 2
;
run;

proc sql noprint;
select max(n) into :n
 from (select count(*) as n from have group by id);
quit;
proc summary data=have;
by id;
ID Age COUNT;
output out=want (drop=_:) idgroup(out[&n] (diag treatment)=);
run;

Results:

 

Reeza_0-1642440129462.png


@monsterpie wrote:

Hello all,

 

I have a long dataset and am using proc summary and sql to transform it to a wide dataset. I'm having trouble adapting the following code to retain all of my variables in my long dataset in the new wide dataset. Example of my code is below. For my age and count variables, I would like to retain them but if I include them in the IDGROUP statement, then SAS creates multiple age and count columns in my wide dataset (I would just like one, see desired output below).

 


data have;
input id visit_num diag$ age treatment count;
cards;
01 1 diabetes 42 1 1
02 1 stroke 78 1 2
02 2 cancer . 2 2
02 3 stroke . . 2
03 1 stroke 66 2 2
03 2 copd . 1 2
03 3 . . . 2
;
run;

proc sql noprint;
select max(n) into :n
 from (select count(*) as n from have group by id);
quit;
proc summary data=have;
by id;
output out=want idgroup(out[&n] (visit_num diag treatment)=);
run;

Desired output:(Note, in my actual dataset I have ~40 variables similar to count and age, where I would like just one column for each of these variables in my wide dataset).

ID Visit_Num1 Visit_Num2 Visit_Num3 Age Treatment1 Treatment2 Treatment3 Count
01 diabetes     42 1     1
02 stroke cancer stroke 78 1 2   2
03 stroke copd   66 2 1   2

 

View solution in original post

3 REPLIES 3
r_behata
Barite | Level 11

Without re-writing the existing code  ( which works just fine for most part). I would suggest adding a step to re-merging the age and count variables.

 


proc sql noprint;
	select max(n) into :n from (select count(*) as n from have group by id);
quit;

proc summary data=have;
	by id;
	output out=want idgroup(out[&n] (visit_num diag treatment)=);
run;


data final;
	merge want have(where=(age ne .) keep=id age  count );
	by id;
run;
Tom
Super User Tom
Super User

If AGE and/or COUNT is constant for an ID then you could include it in the BY statement. But that does not seem to be the case. Why?

 

To drop variables use the DROP= dataset option.

proc summary data=have;
  by id;
  output
    out=want(drop=age2-age&n count2-count&n
             rename=(age1=age count1=count))
    idgroup(out[&n] (visit_num age diag treatment count)=)
  ;
run;

If AGE and COUNT a numeric you could calculate a statistic for them instead.  Perhaps the minimum age and the sum of the counts?

proc summary data=have;
  by id;
  var age count;
  output
    out=want
   min(age)= sum(count)=
    idgroup(out[&n] (visit_num diag treatment )=)
  ;
run;
Reeza
Super User

This gets you pretty close, except it's not coded as VISIT_NUM but DIAG1 - DIAG3

 

 

data have;
input id visit_num diag$ age treatment count;
cards;
01 1 diabetes 42 1 1
02 1 stroke 78 1 2
02 2 cancer . 2 2
02 3 stroke . . 2
03 1 stroke 66 2 2
03 2 copd . 1 2
03 3 . . . 2
;
run;

proc sql noprint;
select max(n) into :n
 from (select count(*) as n from have group by id);
quit;
proc summary data=have;
by id;
ID Age COUNT;
output out=want (drop=_:) idgroup(out[&n] (diag treatment)=);
run;

Results:

 

Reeza_0-1642440129462.png


@monsterpie wrote:

Hello all,

 

I have a long dataset and am using proc summary and sql to transform it to a wide dataset. I'm having trouble adapting the following code to retain all of my variables in my long dataset in the new wide dataset. Example of my code is below. For my age and count variables, I would like to retain them but if I include them in the IDGROUP statement, then SAS creates multiple age and count columns in my wide dataset (I would just like one, see desired output below).

 


data have;
input id visit_num diag$ age treatment count;
cards;
01 1 diabetes 42 1 1
02 1 stroke 78 1 2
02 2 cancer . 2 2
02 3 stroke . . 2
03 1 stroke 66 2 2
03 2 copd . 1 2
03 3 . . . 2
;
run;

proc sql noprint;
select max(n) into :n
 from (select count(*) as n from have group by id);
quit;
proc summary data=have;
by id;
output out=want idgroup(out[&n] (visit_num diag treatment)=);
run;

Desired output:(Note, in my actual dataset I have ~40 variables similar to count and age, where I would like just one column for each of these variables in my wide dataset).

ID Visit_Num1 Visit_Num2 Visit_Num3 Age Treatment1 Treatment2 Treatment3 Count
01 diabetes     42 1     1
02 stroke cancer stroke 78 1 2   2
03 stroke copd   66 2 1   2

 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 2802 views
  • 2 likes
  • 4 in conversation