Hello all,
I have a long dataset and am using proc summary and sql to transform it to a wide dataset. I'm having trouble adapting the following code to retain all of my variables in my long dataset in the new wide dataset. Example of my code is below. For my age and count variables, I would like to retain them but if I include them in the IDGROUP statement, then SAS creates multiple age and count columns in my wide dataset (I would just like one, see desired output below).
data have;
input id visit_num diag$ age treatment count;
cards;
01 1 diabetes 42 1 1
02 1 stroke 78 1 2
02 2 cancer . 2 2
02 3 stroke . . 2
03 1 stroke 66 2 2
03 2 copd . 1 2
03 3 . . . 2
;
run;
proc sql noprint;
select max(n) into :n
from (select count(*) as n from have group by id);
quit;
proc summary data=have;
by id;
output out=want idgroup(out[&n] (visit_num diag treatment)=);
run;
Desired output:(Note, in my actual dataset I have ~40 variables similar to count and age, where I would like just one column for each of these variables in my wide dataset).
ID | Visit_Num1 | Visit_Num2 | Visit_Num3 | Age | Treatment1 | Treatment2 | Treatment3 | Count |
01 | diabetes | 42 | 1 | 1 | ||||
02 | stroke | cancer | stroke | 78 | 1 | 2 | 2 | |
03 | stroke | copd | 66 | 2 | 1 | 2 |
This gets you pretty close, except it's not coded as VISIT_NUM but DIAG1 - DIAG3
data have;
input id visit_num diag$ age treatment count;
cards;
01 1 diabetes 42 1 1
02 1 stroke 78 1 2
02 2 cancer . 2 2
02 3 stroke . . 2
03 1 stroke 66 2 2
03 2 copd . 1 2
03 3 . . . 2
;
run;
proc sql noprint;
select max(n) into :n
from (select count(*) as n from have group by id);
quit;
proc summary data=have;
by id;
ID Age COUNT;
output out=want (drop=_:) idgroup(out[&n] (diag treatment)=);
run;
Results:
@monsterpie wrote:
Hello all,
I have a long dataset and am using proc summary and sql to transform it to a wide dataset. I'm having trouble adapting the following code to retain all of my variables in my long dataset in the new wide dataset. Example of my code is below. For my age and count variables, I would like to retain them but if I include them in the IDGROUP statement, then SAS creates multiple age and count columns in my wide dataset (I would just like one, see desired output below).
data have; input id visit_num diag$ age treatment count; cards; 01 1 diabetes 42 1 1 02 1 stroke 78 1 2 02 2 cancer . 2 2 02 3 stroke . . 2 03 1 stroke 66 2 2 03 2 copd . 1 2 03 3 . . . 2 ; run; proc sql noprint; select max(n) into :n from (select count(*) as n from have group by id); quit; proc summary data=have; by id; output out=want idgroup(out[&n] (visit_num diag treatment)=); run;
Desired output:(Note, in my actual dataset I have ~40 variables similar to count and age, where I would like just one column for each of these variables in my wide dataset).
ID Visit_Num1 Visit_Num2 Visit_Num3 Age Treatment1 Treatment2 Treatment3 Count 01 diabetes 42 1 1 02 stroke cancer stroke 78 1 2 2 03 stroke copd 66 2 1 2
Without re-writing the existing code ( which works just fine for most part). I would suggest adding a step to re-merging the age and count variables.
proc sql noprint;
select max(n) into :n from (select count(*) as n from have group by id);
quit;
proc summary data=have;
by id;
output out=want idgroup(out[&n] (visit_num diag treatment)=);
run;
data final;
merge want have(where=(age ne .) keep=id age count );
by id;
run;
If AGE and/or COUNT is constant for an ID then you could include it in the BY statement. But that does not seem to be the case. Why?
To drop variables use the DROP= dataset option.
proc summary data=have;
by id;
output
out=want(drop=age2-age&n count2-count&n
rename=(age1=age count1=count))
idgroup(out[&n] (visit_num age diag treatment count)=)
;
run;
If AGE and COUNT a numeric you could calculate a statistic for them instead. Perhaps the minimum age and the sum of the counts?
proc summary data=have;
by id;
var age count;
output
out=want
min(age)= sum(count)=
idgroup(out[&n] (visit_num diag treatment )=)
;
run;
This gets you pretty close, except it's not coded as VISIT_NUM but DIAG1 - DIAG3
data have;
input id visit_num diag$ age treatment count;
cards;
01 1 diabetes 42 1 1
02 1 stroke 78 1 2
02 2 cancer . 2 2
02 3 stroke . . 2
03 1 stroke 66 2 2
03 2 copd . 1 2
03 3 . . . 2
;
run;
proc sql noprint;
select max(n) into :n
from (select count(*) as n from have group by id);
quit;
proc summary data=have;
by id;
ID Age COUNT;
output out=want (drop=_:) idgroup(out[&n] (diag treatment)=);
run;
Results:
@monsterpie wrote:
Hello all,
I have a long dataset and am using proc summary and sql to transform it to a wide dataset. I'm having trouble adapting the following code to retain all of my variables in my long dataset in the new wide dataset. Example of my code is below. For my age and count variables, I would like to retain them but if I include them in the IDGROUP statement, then SAS creates multiple age and count columns in my wide dataset (I would just like one, see desired output below).
data have; input id visit_num diag$ age treatment count; cards; 01 1 diabetes 42 1 1 02 1 stroke 78 1 2 02 2 cancer . 2 2 02 3 stroke . . 2 03 1 stroke 66 2 2 03 2 copd . 1 2 03 3 . . . 2 ; run; proc sql noprint; select max(n) into :n from (select count(*) as n from have group by id); quit; proc summary data=have; by id; output out=want idgroup(out[&n] (visit_num diag treatment)=); run;
Desired output:(Note, in my actual dataset I have ~40 variables similar to count and age, where I would like just one column for each of these variables in my wide dataset).
ID Visit_Num1 Visit_Num2 Visit_Num3 Age Treatment1 Treatment2 Treatment3 Count 01 diabetes 42 1 1 02 stroke cancer stroke 78 1 2 2 03 stroke copd 66 2 1 2
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.