Solved: Re: Transpose_diag

Chris_LK_87 · Posted 06-03-2020 06:54 AM

Hello!

I have a dataset that contains diagnosis for individuals. Each individual can appear on multiple rows depending on the number of diagnosis they have.

Data Have;

Ind_ID$ DiagCode$ Gender$ Age$
1 I11 1 55

1 C12 1 55

1 C01 1 55

2 Y11 2 67

2 F99 2 67

3 X12 1 92

4 C05 2 81

I want to transpose the dataset and count the number of diagnosis ( var. num_diag) each individual have. I also want to place each diagnosis in separate variable.:

Data want;

Ind_ID$ Num_diag DCode1$ DCode2$ DCode2$ Gender$ Age$
1 3 I11 C12 C01 1 55

2 2 Y11 F99 2 67

3 1 X12 1 92

4 1 C01 2 81

ed_sas_member · Posted 06-03-2020 07:51 AM

Hi @Chris_LK_87

Please try this, using a PROC TRANSPOSE first, and then a data step to compute Num_Diag.

Best,

data have;
input Ind_ID DiagCode $ Gender Age ;
cards;
1 I11 1 55
1 C12 1 55
1 C01 1 55
2 Y11 2 67
2 F99 2 67
3 X12 1 92
4 C05 2 81
;
run;

proc sort data = have;
  BY Ind_ID;
run;

proc transpose data = have out = have_tr (drop = _:) prefix=DiagCode;
  by Ind_ID Gender Age;
  var DiagCode; 
run;

data want;
	set have_tr;
	Num_diag = countw(catx(" ", of DiagCode:));
run;

proc print;
run;

View solution in original post

yabwon · Posted 06-03-2020 07:47 AM

Hi,

Read about proc transpose

and try this:

data have;
input Ind_ID DiagCode $      Gender    Age ;
cards;
1 I11 1 55
1 C12 1 55
1 C01 1 55
2 Y11 2 67
2 F99 2 67
3 X12 1 92
4 C05 2 81
;
run;

proc sort data = have;
  BY Ind_ID;
run;

proc transpose data = have out = want(drop = _name_) prefix=DiagCode;
  by Ind_ID Gender    Age;
  var DiagCode; 
run;
proc print;
run;

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation

ed_sas_member · Posted 06-03-2020 07:51 AM

Hi @Chris_LK_87

Please try this, using a PROC TRANSPOSE first, and then a data step to compute Num_Diag.

Best,

data have;
input Ind_ID DiagCode $ Gender Age ;
cards;
1 I11 1 55
1 C12 1 55
1 C01 1 55
2 Y11 2 67
2 F99 2 67
3 X12 1 92
4 C05 2 81
;
run;

proc sort data = have;
  BY Ind_ID;
run;

proc transpose data = have out = have_tr (drop = _:) prefix=DiagCode;
  by Ind_ID Gender Age;
  var DiagCode; 
run;

data want;
	set have_tr;
	Num_diag = countw(catx(" ", of DiagCode:));
run;

proc print;
run;

Chris_LK_87 · Posted 06-03-2020 09:30 AM

Thanks!

It seems to work!

But I have a question regarding selecting diagnosis (DiagCode). Is it possible to select distinct diagnosis?

ballardw · Posted 06-03-2020 11:08 AM

@Chris_LK_87 wrote:

Thanks!

It seems to work!

But I have a question regarding selecting diagnosis (DiagCode). Is it possible to select distinct diagnosis?

Any place you have a Data=somedataset you can use data set options to keep or drop variable or select values for a given variable (or variables).

proc transpose data = have (where =(diagcode in (<firstvalue> <secondvalue> <...> ))
    out = have_tr (drop = _:) prefix=DiagCode;
  by Ind_ID Gender Age;
  var DiagCode; 
run;

The example of the Where=() dataset option above is very generic. You would place the values you want where I have "firstvalue" and such. If the values are character then they should be in quotes. The values are compared exactly to your list so do not expect "Abc" to match "ABC".

The Where=() needs the parentheses to let SAS know where the comparisons end and separate from other options like Keep, drop or rename lists.

Chris_LK_87 · Posted 06-04-2020 02:11 AM

Thank you!

Kurt_Bremser · Posted 06-03-2020 07:54 AM

The count is achieved easily:

proc sql;
create table want as
  select
    ind_id,
    count(ind_id) as num_diag
  from have
  group by ind_id
;
quit;

Why do you want to transpose your dataset of diagnoses? It will only make your future work harder.

Instead I suppose to normalize your data, reducing redundancy and waste of storage:

data
  patients (keep=ind_id gender age num_diag)
  diagnoses (keep=ind_id diagcode)
;
set have;
by ind_id;
if first.ind_id
then num_diag = 1;
else num_diag + 1;
output diagnoses;
if last.ind_id then output patients;
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away