I have a few questions about comma delimited data, do-loops and dummy variables.
(SAS 9.4)
I have a data set that includes multiple diagnoses in one column separated by commas:
| Obs | CMRDiagnosis | 
| 1 | Apical HCM,Fibrosis | 
| 2 | Sarcoidosis,DCM | 
| 3 | Amyloidosis | 
| 4 | MI,Fibrosis | 
| 5 | Myocarditis,DCM | 
| 6 | Myocarditis | 
| 7 | Valvular dz | 
| 8 | |
| 9 | MI,LVH | 
| 10 | DCM | 
| 11 | HCM,Fibrosis,Valvular dz | 
Since each person can have more than one diagnosis (max 6 diagnoses per person, 13 diagnoses possible) I would like to create a dummy variable for each of the 13 diagnosis, (labeled as "_MR" aka the source of the diagnosis, to distinguish from other diagnosis variables).
Not knowing how to separate the comma delimited variables, I separated them in Excel as DX1-DX6. After importing the new data,
I did the following:
select (DX1);
	when ('Amyloidosis') Amyloid_MR =1;
	when ('Apical HCM') Ap_HCM_MR =1;
	when ('CM') CM_MR =1;
	when ('DCM') DCM_MR =1;
	when ('Fibrosis') Fibrosis_MR =1;
	when ('HCM') HCM_MR =1;
	when ('LVH') LVH_MR =1;
	when ('MI') MI_MR =1;
	when ('Myocarditis') Myocarditis_MR =1;
	when ('Noncompaction') Noncompaction_MR =1;
	when ('Pericardial dz') Pericardial_MR =1;
	when ('Sarcoidosis') Sarcoidosis_MR =1;
	when ('Valvular dz') Valvular_MR =1;
	otherwise; end;
1. Is there a better way to accomplish this for all DX vars with a do-loop (or macro) to execute the above code without copy-pasting 5 additional times for each DX?
2. How would I separate the comma delimited data?
3. The above code leaves you with variables like "Valvular_MR" where observations are either "1" or " ". How can I execute this code while also replacing the missing rows with "0" as appropriate?
Thank you!
1. Use arrays instead of select
https://stats.idre.ucla.edu/sas/seminars/sas-arrays/
2. to separate diagnosis use SCAN + COUNTW()
Import data as is. Separate in SAS and assign to variables.
Untested but this should work - once you fix the coding bugs I likely have in there.
data want;
set have;
array diag(*) diag1-diag13;
array dName(*) _temporary_ $ ("Myocarditis", "Valvular dz");
nwords=countw(diagnosis);
do i=1 to nwords;
word = scan(diagnosis, i);
x=whichn(word, of dname(*));
diag(x) = 1;
end;
run;
@webster1002 wrote:
I have a few questions about comma delimited data, do-loops and dummy variables.
(SAS 9.4)
I have a data set that includes multiple diagnoses in one column separated by commas:
Obs CMRDiagnosis 1 Apical HCM,Fibrosis 2 Sarcoidosis,DCM 3 Amyloidosis 4 MI,Fibrosis 5 Myocarditis,DCM 6 Myocarditis 7 Valvular dz 8 9 MI,LVH 10 DCM 11 HCM,Fibrosis,Valvular dz
Since each person can have more than one diagnosis (max 6 diagnoses per person, 13 diagnoses possible) I would like to create a dummy variable for each of the 13 diagnosis, (labeled as "_MR" aka the source of the diagnosis, to distinguish from other diagnosis variables).
Not knowing how to separate the comma delimited variables, I separated them in Excel as DX1-DX6. After importing the new data,
I did the following:
select (DX1); when ('Amyloidosis') Amyloid_MR =1; when ('Apical HCM') Ap_HCM_MR =1; when ('CM') CM_MR =1; when ('DCM') DCM_MR =1; when ('Fibrosis') Fibrosis_MR =1; when ('HCM') HCM_MR =1; when ('LVH') LVH_MR =1; when ('MI') MI_MR =1; when ('Myocarditis') Myocarditis_MR =1; when ('Noncompaction') Noncompaction_MR =1; when ('Pericardial dz') Pericardial_MR =1; when ('Sarcoidosis') Sarcoidosis_MR =1; when ('Valvular dz') Valvular_MR =1; otherwise; end;1. Is there a better way to accomplish this for all DX vars with a do-loop (or macro) to execute the above code without copy-pasting 5 additional times for each DX?
2. How would I separate the comma delimited data?
3. The above code leaves you with variables like "Valvular_MR" where observations are either "1" or " ". How can I execute this code while also replacing the missing rows with "0" as appropriate?
Thank you!
data have;
infile cards truncover;
input Obs	CMRDiagnosis $30.;
cards;
1	Apical HCM,Fibrosis
2	Sarcoidosis,DCM
3	Amyloidosis
4	MI,Fibrosis
5	Myocarditis,DCM
6	Myocarditis
7	Valvular dz
8	 
9	MI,LVH
10	DCM
11	HCM,Fibrosis,Valvular dz
;
data want;
set have;
array t(999)$30 _temporary_;
length want $20;
dummy=1;
do i=1 to countw(CMRDiagnosis,',');
if CMRDiagnosis>' ' then want=scan(CMRDiagnosis,i,',');
if want not in t then do;
output;
n+1;
t(n)=want;
end;
end;
drop CMRDiagnosis i n;;
rename want=CMRDiagnosis;
run;Generally, the process of writing code to create dummy variables is not necessary in SAS, as most SAS procedures have built-in to them the ability to create dummy variables. In your case, you would have to separate the values on either side of the comma to be separate variables. Or, since you said you already have them separated in Excel, just read those into SAS and let the SAS PROC do its internal calculation of dummy variables.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
