Hello everyone, i am new in sas and i maybe have a dumb question, but i could not find any answer online.
The following table contains the diseases diagnosed in each person (ID). Some have only one disease and others have multiples.
I would like to reorganize the date putting each disease in one column as the second table shows.
Can you guys help me?
ID | 1 disease | 2 diseases | 3 disease | 4 diseases |
1 | HIV | Meningitis | ||
2 | Ebola | pneumonia | ||
3 | Flu | HIV | Ebola | |
4 | Flu | Ebola | Meningitis | |
5 | Flu | |||
6 | Meningitis | Ebola | Flu | HIV |
7 | pneumonia | Flu | HIV | |
8 | Ebola | Meningitis | ||
9 | Flu | Meningitis | ||
10 | Ebola | |||
11 | Ebola | |||
12 | Meningitis | Flu | ||
13 | Meningitis | Ebola | HIV | Flu |
14 | Meningitis | HIV |
ID | HIV | Flu | Meningitis | Ebola | Pneumonia |
1 | HIV | Meningitis | |||
2 | Ebola | Pneumonia | |||
3 | HIV | Flu | Ebola | ||
4 | Flu | Meningitis | Ebola | ||
5 | Flu | ||||
6 | Meningitis | ||||
7 | HIV | Flu | Pneumonia | ||
8 | Meningitis | Ebola | |||
9 | Flu | Meningitis | |||
10 | Ebola | ||||
11 | Ebola | ||||
12 | Flu | Meningitis | |||
13 | HIV | Flu | Meningitis | Ebola | |
14 | HIV |
Hi @edison83. Welcome to SAS communities. Not at all a dumb question. Take a look at my code below!
What you want to do here is convert to long form and then transpose back:
data have;
infile datalines dlm=',' missover;
input ID (disease_1-disease_4) (:$15.);
datalines;
1,HIV,Meningitis
2,Ebola,pneumonia,
3,Flu,HIV,Ebola,
4,Flu,Ebola,Meningitis,
5,Flu,
6,Meningitis,Ebola,Flu,HIV
7,pneumonia,Flu,HIV,
8,Ebola,Meningitis,
9,Flu,Meningitis,
10,Ebola,
11,Ebola,
12,Meningitis,Flu,
13,Meningitis,Ebola,HIV,Flu
14,Meningitis,HIV,
;
run;
data have_long;
set have;
array arr_dis disease:;
do over arr_dis;
if arr_dis ne '' then
do;
disease=arr_dis;
output;
end;
end;
run;
proc sort data=have_long;
by id disease;
run;
proc transpose data=have_long(keep=id disease) out=want(drop=_name_);
by id;
id disease;
var disease;
run;
Hi @edison83. Welcome to SAS communities. Not at all a dumb question. Take a look at my code below!
What you want to do here is convert to long form and then transpose back:
data have;
infile datalines dlm=',' missover;
input ID (disease_1-disease_4) (:$15.);
datalines;
1,HIV,Meningitis
2,Ebola,pneumonia,
3,Flu,HIV,Ebola,
4,Flu,Ebola,Meningitis,
5,Flu,
6,Meningitis,Ebola,Flu,HIV
7,pneumonia,Flu,HIV,
8,Ebola,Meningitis,
9,Flu,Meningitis,
10,Ebola,
11,Ebola,
12,Meningitis,Flu,
13,Meningitis,Ebola,HIV,Flu
14,Meningitis,HIV,
;
run;
data have_long;
set have;
array arr_dis disease:;
do over arr_dis;
if arr_dis ne '' then
do;
disease=arr_dis;
output;
end;
end;
run;
proc sort data=have_long;
by id disease;
run;
proc transpose data=have_long(keep=id disease) out=want(drop=_name_);
by id;
id disease;
var disease;
run;
Same idea, just a double PROC transpose as this syntactically seems easy, at least to me
proc transpose data=have out=temp;
by id;
var disease_1--disease_4;
run;
proc transpose data=temp out=want(drop=_name_);
where not missing(col1);
by id;
var col1;
id col1;
run;
There is a solution that can be implemented in one pass through the data. It's admittedly more programming, but possibly more efficient with large data sets. It requires use of a hash table to store the wide-form data, and a small hash table to track the variety of diseases:
data have;
infile datalines dlm=',' missover;
input ID (disease_1-disease_4) (:$15.);
datalines;
1,HIV,Meningitis
2,Ebola,pneumonia,
3,Flu,HIV,Ebola,
4,Flu,Ebola,Meningitis,
5,Flu,
6,Meningitis,Ebola,Flu,HIV
7,pneumonia,Flu,HIV,
8,Ebola,Meningitis,
9,Flu,Meningitis,
10,Ebola,
11,Ebola,
12,Meningitis,Flu,
13,Meningitis,Ebola,HIV,Flu
14,Meningitis,HIV,
;
run;
data empty;
length id 8 name1-name40 $32;
call missing (of _all_);
stop;
run;
data _null_;
merge have empty end=end_of_have;
length var_renames $5000;
retain var_renames;
if _n_=1 then do;
declare hash wide (dataset:'empty',ordered:'a');
wide.definekey('id');
wide.definedata(all:'Y');
wide.definedone();
declare hash dlookup();
dlookup.definekey('disease');
dlookup.definedata('disease','d');
dlookup.definedone();
end;
array nam {*} $32 name: ;
do disease=disease_1,disease_2,disease_3,disease_4;
if disease=' ' then leave;
if dlookup.find()^=0 then do;
d=dlookup.num_items+1;;
dlookup.add();
var_renames=catx(' ',var_renames,cats('name',d,'=',disease));
end;
nam{d}=disease;
end;
wide.add();
if end_of_have;
nd=dlookup.num_items;
wide.output (dataset:catx(' ',"want (keep=id",cats('name1-name',ND),'rename=(',var_renames,')'));
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.