Hello!
I have a dataset that contains diagnosis for individuals. Each individual can appear on multiple rows depending on the number of diagnosis they have.
Data Have;
Ind_ID$ DiagCode$ Gender$ Age$
1 I11 1 55
1 C12 1 55
1 C01 1 55
2 Y11 2 67
2 F99 2 67
3 X12 1 92
4 C05 2 81
I want to transpose the dataset and count the number of diagnosis ( var. num_diag) each individual have. I also want to place each diagnosis in separate variable.:
Data want;
Ind_ID$ Num_diag DCode1$ DCode2$ DCode2$ Gender$ Age$
1 3 I11 C12 C01 1 55
2 2 Y11 F99 2 67
3 1 X12 1 92
4 1 C01 2 81
I would do it like this:
data have;
input Ind_ID$ DiagCode$ Gender$ Age$;
cards;
1 I11 1 55
1 C12 1 55
1 C01 1 55
2 Y11 2 67
2 F99 2 67
3 X12 1 92
4 C05 2 81
;run;
proc sql noprint; /* get max no of Diags for an ID */
select max(n) into :nCodes trimmed
from (select count(diagcode) as n from have group by Ind_ID);
quit;
data want;
do num_diag=1 to &nCodes until(last.Ind_ID);
set have;
by Ind_ID;
array DCodes (*) $8 DCode1-DCode&nCodes;
DCodes(num_diag)=DiagCode;
end;
drop DiagCode;
run;
proc sql;
select count(distinct DiagCode) into: ndiagcode from Data_Have;
quit;
proc sort data_have;
by ind_ID;
run;
data DataWant(Drop=DiagCode);
length DCode1 -- Dcode&ndiagcode. $8.;
Set Data_Have;
by ind_ID;
array s{} DCode1 -- Dcode&ndiagcode.;
retain DCode1 -- Dcode&ndiagcode. '' Num_diag 0;
if first.ind_ID then do;
Num_diag=0;
end;
Num_diag=Num_diag+1;
s{Num_diag} = DiagCode;
if last.ind_ID then output;
run;
I would do it like this:
data have;
input Ind_ID$ DiagCode$ Gender$ Age$;
cards;
1 I11 1 55
1 C12 1 55
1 C01 1 55
2 Y11 2 67
2 F99 2 67
3 X12 1 92
4 C05 2 81
;run;
proc sql noprint; /* get max no of Diags for an ID */
select max(n) into :nCodes trimmed
from (select count(diagcode) as n from have group by Ind_ID);
quit;
data want;
do num_diag=1 to &nCodes until(last.Ind_ID);
set have;
by Ind_ID;
array DCodes (*) $8 DCode1-DCode&nCodes;
DCodes(num_diag)=DiagCode;
end;
drop DiagCode;
run;
Keep it simple:
proc transpose data=have prefix=DCode out=temp(drop=_name_);
by ind_id gender age notsorted;
var diagCode;
run;
data want;
set temp;
array dc_ DCode:;
numDiag = dim(dc_) - cmiss(of DCode:);
run;
Thank you but I am biased against proc transpose. However I agree that is a simple solution.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.