Hello everyone, i am new in sas and i maybe have a dumb question, but i could not find any answer online.
The following table contains the diseases diagnosed in each person (ID). Some have only one disease and others have multiples.
I would like to reorganize the date putting each disease in one column as the second table shows.
Can you guys help me?
| ID | 1 disease | 2 diseases | 3 disease | 4 diseases |
| 1 | HIV | Meningitis | ||
| 2 | Ebola | pneumonia | ||
| 3 | Flu | HIV | Ebola | |
| 4 | Flu | Ebola | Meningitis | |
| 5 | Flu | |||
| 6 | Meningitis | Ebola | Flu | HIV |
| 7 | pneumonia | Flu | HIV | |
| 8 | Ebola | Meningitis | ||
| 9 | Flu | Meningitis | ||
| 10 | Ebola | |||
| 11 | Ebola | |||
| 12 | Meningitis | Flu | ||
| 13 | Meningitis | Ebola | HIV | Flu |
| 14 | Meningitis | HIV |
| ID | HIV | Flu | Meningitis | Ebola | Pneumonia |
| 1 | HIV | Meningitis | |||
| 2 | Ebola | Pneumonia | |||
| 3 | HIV | Flu | Ebola | ||
| 4 | Flu | Meningitis | Ebola | ||
| 5 | Flu | ||||
| 6 | Meningitis | ||||
| 7 | HIV | Flu | Pneumonia | ||
| 8 | Meningitis | Ebola | |||
| 9 | Flu | Meningitis | |||
| 10 | Ebola | ||||
| 11 | Ebola | ||||
| 12 | Flu | Meningitis | |||
| 13 | HIV | Flu | Meningitis | Ebola | |
| 14 | HIV |
Hi @edison83. Welcome to SAS communities. Not at all a dumb question. Take a look at my code below!
What you want to do here is convert to long form and then transpose back:
data have;
infile datalines dlm=',' missover;
input ID (disease_1-disease_4) (:$15.);
datalines;
1,HIV,Meningitis
2,Ebola,pneumonia,
3,Flu,HIV,Ebola,
4,Flu,Ebola,Meningitis,
5,Flu,
6,Meningitis,Ebola,Flu,HIV
7,pneumonia,Flu,HIV,
8,Ebola,Meningitis,
9,Flu,Meningitis,
10,Ebola,
11,Ebola,
12,Meningitis,Flu,
13,Meningitis,Ebola,HIV,Flu
14,Meningitis,HIV,
;
run;
data have_long;
set have;
array arr_dis disease:;
do over arr_dis;
if arr_dis ne '' then
do;
disease=arr_dis;
output;
end;
end;
run;
proc sort data=have_long;
by id disease;
run;
proc transpose data=have_long(keep=id disease) out=want(drop=_name_);
by id;
id disease;
var disease;
run;
Hi @edison83. Welcome to SAS communities. Not at all a dumb question. Take a look at my code below!
What you want to do here is convert to long form and then transpose back:
data have;
infile datalines dlm=',' missover;
input ID (disease_1-disease_4) (:$15.);
datalines;
1,HIV,Meningitis
2,Ebola,pneumonia,
3,Flu,HIV,Ebola,
4,Flu,Ebola,Meningitis,
5,Flu,
6,Meningitis,Ebola,Flu,HIV
7,pneumonia,Flu,HIV,
8,Ebola,Meningitis,
9,Flu,Meningitis,
10,Ebola,
11,Ebola,
12,Meningitis,Flu,
13,Meningitis,Ebola,HIV,Flu
14,Meningitis,HIV,
;
run;
data have_long;
set have;
array arr_dis disease:;
do over arr_dis;
if arr_dis ne '' then
do;
disease=arr_dis;
output;
end;
end;
run;
proc sort data=have_long;
by id disease;
run;
proc transpose data=have_long(keep=id disease) out=want(drop=_name_);
by id;
id disease;
var disease;
run;
Same idea, just a double PROC transpose as this syntactically seems easy, at least to me
proc transpose data=have out=temp;
by id;
var disease_1--disease_4;
run;
proc transpose data=temp out=want(drop=_name_);
where not missing(col1);
by id;
var col1;
id col1;
run;
There is a solution that can be implemented in one pass through the data. It's admittedly more programming, but possibly more efficient with large data sets. It requires use of a hash table to store the wide-form data, and a small hash table to track the variety of diseases:
data have;
infile datalines dlm=',' missover;
input ID (disease_1-disease_4) (:$15.);
datalines;
1,HIV,Meningitis
2,Ebola,pneumonia,
3,Flu,HIV,Ebola,
4,Flu,Ebola,Meningitis,
5,Flu,
6,Meningitis,Ebola,Flu,HIV
7,pneumonia,Flu,HIV,
8,Ebola,Meningitis,
9,Flu,Meningitis,
10,Ebola,
11,Ebola,
12,Meningitis,Flu,
13,Meningitis,Ebola,HIV,Flu
14,Meningitis,HIV,
;
run;
data empty;
length id 8 name1-name40 $32;
call missing (of _all_);
stop;
run;
data _null_;
merge have empty end=end_of_have;
length var_renames $5000;
retain var_renames;
if _n_=1 then do;
declare hash wide (dataset:'empty',ordered:'a');
wide.definekey('id');
wide.definedata(all:'Y');
wide.definedone();
declare hash dlookup();
dlookup.definekey('disease');
dlookup.definedata('disease','d');
dlookup.definedone();
end;
array nam {*} $32 name: ;
do disease=disease_1,disease_2,disease_3,disease_4;
if disease=' ' then leave;
if dlookup.find()^=0 then do;
d=dlookup.num_items+1;;
dlookup.add();
var_renames=catx(' ',var_renames,cats('name',d,'=',disease));
end;
nam{d}=disease;
end;
wide.add();
if end_of_have;
nd=dlookup.num_items;
wide.output (dataset:catx(' ',"want (keep=id",cats('name1-name',ND),'rename=(',var_renames,')'));
run;
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.