Hello!
I have a dataset that contains diagnosis for individuals. Each individual can appear on multiple rows depending on the number of diagnosis they have.
Data Have;
Ind_ID$ DiagCode$ Gender$ Age$
1 I11 1 55
1 C12 1 55
1 C01 1 55
2 Y11 2 67
2 F99 2 67
3 X12 1 92
4 C05 2 81
I want to transpose the dataset and count the number of diagnosis ( var. num_diag) each individual have. I also want to place each diagnosis in separate variable.:
Data want;
Ind_ID$ Num_diag DCode1$ DCode2$ DCode2$ Gender$ Age$
1 3 I11 C12 C01 1 55
2 2 Y11 F99 2 67
3 1 X12 1 92
4 1 C01 2 81
Hi @Chris_LK_87
Please try this, using a PROC TRANSPOSE first, and then a data step to compute Num_Diag.
Best,
data have;
input Ind_ID DiagCode $ Gender Age ;
cards;
1 I11 1 55
1 C12 1 55
1 C01 1 55
2 Y11 2 67
2 F99 2 67
3 X12 1 92
4 C05 2 81
;
run;
proc sort data = have;
BY Ind_ID;
run;
proc transpose data = have out = have_tr (drop = _:) prefix=DiagCode;
by Ind_ID Gender Age;
var DiagCode;
run;
data want;
set have_tr;
Num_diag = countw(catx(" ", of DiagCode:));
run;
proc print;
run;
Hi,
Read about proc transpose
and try this:
data have;
input Ind_ID DiagCode $ Gender Age ;
cards;
1 I11 1 55
1 C12 1 55
1 C01 1 55
2 Y11 2 67
2 F99 2 67
3 X12 1 92
4 C05 2 81
;
run;
proc sort data = have;
BY Ind_ID;
run;
proc transpose data = have out = want(drop = _name_) prefix=DiagCode;
by Ind_ID Gender Age;
var DiagCode;
run;
proc print;
run;
Bart
Hi @Chris_LK_87
Please try this, using a PROC TRANSPOSE first, and then a data step to compute Num_Diag.
Best,
data have;
input Ind_ID DiagCode $ Gender Age ;
cards;
1 I11 1 55
1 C12 1 55
1 C01 1 55
2 Y11 2 67
2 F99 2 67
3 X12 1 92
4 C05 2 81
;
run;
proc sort data = have;
BY Ind_ID;
run;
proc transpose data = have out = have_tr (drop = _:) prefix=DiagCode;
by Ind_ID Gender Age;
var DiagCode;
run;
data want;
set have_tr;
Num_diag = countw(catx(" ", of DiagCode:));
run;
proc print;
run;
Thanks!
It seems to work!
But I have a question regarding selecting diagnosis (DiagCode). Is it possible to select distinct diagnosis?
@Chris_LK_87 wrote:
Thanks!
It seems to work!
But I have a question regarding selecting diagnosis (DiagCode). Is it possible to select distinct diagnosis?
Any place you have a Data=somedataset you can use data set options to keep or drop variable or select values for a given variable (or variables).
proc transpose data = have (where =(diagcode in (<firstvalue> <secondvalue> <...> ))
out = have_tr (drop = _:) prefix=DiagCode;
by Ind_ID Gender Age;
var DiagCode;
run;
The example of the Where=() dataset option above is very generic. You would place the values you want where I have "firstvalue" and such. If the values are character then they should be in quotes. The values are compared exactly to your list so do not expect "Abc" to match "ABC".
The Where=() needs the parentheses to let SAS know where the comparisons end and separate from other options like Keep, drop or rename lists.
The count is achieved easily:
proc sql;
create table want as
select
ind_id,
count(ind_id) as num_diag
from have
group by ind_id
;
quit;
Why do you want to transpose your dataset of diagnoses? It will only make your future work harder.
Instead I suppose to normalize your data, reducing redundancy and waste of storage:
data
patients (keep=ind_id gender age num_diag)
diagnoses (keep=ind_id diagcode)
;
set have;
by ind_id;
if first.ind_id
then num_diag = 1;
else num_diag + 1;
output diagnoses;
if last.ind_id then output patients;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.