Hello everyone, i am new in sas and i maybe have a dumb question, but i could not find any answer online.
The following table contains the diseases diagnosed in each person (ID). Some have only one disease and others have multiples.
I would like to reorganize the date putting each disease in one column as the second table shows.
Can you guys help me?
ID | 1 disease | 2 diseases | 3 disease | 4 diseases |
1 | HIV | Meningitis | ||
2 | Ebola | pneumonia | ||
3 | Flu | HIV | Ebola | |
4 | Flu | Ebola | Meningitis | |
5 | Flu | |||
6 | Meningitis | Ebola | Flu | HIV |
7 | pneumonia | Flu | HIV | |
8 | Ebola | Meningitis | ||
9 | Flu | Meningitis | ||
10 | Ebola | |||
11 | Ebola | |||
12 | Meningitis | Flu | ||
13 | Meningitis | Ebola | HIV | Flu |
14 | Meningitis | HIV |
ID | HIV | Flu | Meningitis | Ebola | Pneumonia |
1 | HIV | Meningitis | |||
2 | Ebola | Pneumonia | |||
3 | HIV | Flu | Ebola | ||
4 | Flu | Meningitis | Ebola | ||
5 | Flu | ||||
6 | Meningitis | ||||
7 | HIV | Flu | Pneumonia | ||
8 | Meningitis | Ebola | |||
9 | Flu | Meningitis | |||
10 | Ebola | ||||
11 | Ebola | ||||
12 | Flu | Meningitis | |||
13 | HIV | Flu | Meningitis | Ebola | |
14 | HIV |
Hi @edison83. Welcome to SAS communities. Not at all a dumb question. Take a look at my code below!
What you want to do here is convert to long form and then transpose back:
data have;
infile datalines dlm=',' missover;
input ID (disease_1-disease_4) (:$15.);
datalines;
1,HIV,Meningitis
2,Ebola,pneumonia,
3,Flu,HIV,Ebola,
4,Flu,Ebola,Meningitis,
5,Flu,
6,Meningitis,Ebola,Flu,HIV
7,pneumonia,Flu,HIV,
8,Ebola,Meningitis,
9,Flu,Meningitis,
10,Ebola,
11,Ebola,
12,Meningitis,Flu,
13,Meningitis,Ebola,HIV,Flu
14,Meningitis,HIV,
;
run;
data have_long;
set have;
array arr_dis disease:;
do over arr_dis;
if arr_dis ne '' then
do;
disease=arr_dis;
output;
end;
end;
run;
proc sort data=have_long;
by id disease;
run;
proc transpose data=have_long(keep=id disease) out=want(drop=_name_);
by id;
id disease;
var disease;
run;
Hi @edison83. Welcome to SAS communities. Not at all a dumb question. Take a look at my code below!
What you want to do here is convert to long form and then transpose back:
data have;
infile datalines dlm=',' missover;
input ID (disease_1-disease_4) (:$15.);
datalines;
1,HIV,Meningitis
2,Ebola,pneumonia,
3,Flu,HIV,Ebola,
4,Flu,Ebola,Meningitis,
5,Flu,
6,Meningitis,Ebola,Flu,HIV
7,pneumonia,Flu,HIV,
8,Ebola,Meningitis,
9,Flu,Meningitis,
10,Ebola,
11,Ebola,
12,Meningitis,Flu,
13,Meningitis,Ebola,HIV,Flu
14,Meningitis,HIV,
;
run;
data have_long;
set have;
array arr_dis disease:;
do over arr_dis;
if arr_dis ne '' then
do;
disease=arr_dis;
output;
end;
end;
run;
proc sort data=have_long;
by id disease;
run;
proc transpose data=have_long(keep=id disease) out=want(drop=_name_);
by id;
id disease;
var disease;
run;
Same idea, just a double PROC transpose as this syntactically seems easy, at least to me
proc transpose data=have out=temp;
by id;
var disease_1--disease_4;
run;
proc transpose data=temp out=want(drop=_name_);
where not missing(col1);
by id;
var col1;
id col1;
run;
There is a solution that can be implemented in one pass through the data. It's admittedly more programming, but possibly more efficient with large data sets. It requires use of a hash table to store the wide-form data, and a small hash table to track the variety of diseases:
data have;
infile datalines dlm=',' missover;
input ID (disease_1-disease_4) (:$15.);
datalines;
1,HIV,Meningitis
2,Ebola,pneumonia,
3,Flu,HIV,Ebola,
4,Flu,Ebola,Meningitis,
5,Flu,
6,Meningitis,Ebola,Flu,HIV
7,pneumonia,Flu,HIV,
8,Ebola,Meningitis,
9,Flu,Meningitis,
10,Ebola,
11,Ebola,
12,Meningitis,Flu,
13,Meningitis,Ebola,HIV,Flu
14,Meningitis,HIV,
;
run;
data empty;
length id 8 name1-name40 $32;
call missing (of _all_);
stop;
run;
data _null_;
merge have empty end=end_of_have;
length var_renames $5000;
retain var_renames;
if _n_=1 then do;
declare hash wide (dataset:'empty',ordered:'a');
wide.definekey('id');
wide.definedata(all:'Y');
wide.definedone();
declare hash dlookup();
dlookup.definekey('disease');
dlookup.definedata('disease','d');
dlookup.definedone();
end;
array nam {*} $32 name: ;
do disease=disease_1,disease_2,disease_3,disease_4;
if disease=' ' then leave;
if dlookup.find()^=0 then do;
d=dlookup.num_items+1;;
dlookup.add();
var_renames=catx(' ',var_renames,cats('name',d,'=',disease));
end;
nam{d}=disease;
end;
wide.add();
if end_of_have;
nd=dlookup.num_items;
wide.output (dataset:catx(' ',"want (keep=id",cats('name1-name',ND),'rename=(',var_renames,')'));
run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.