BookmarkSubscribeRSS Feed
webster1002
Calcite | Level 5

I have a few questions about comma delimited data, do-loops and dummy variables.

(SAS 9.4)

I have a data set that includes multiple diagnoses in one column separated by commas:

 

ObsCMRDiagnosis
1Apical HCM,Fibrosis
2Sarcoidosis,DCM
3Amyloidosis
4MI,Fibrosis
5Myocarditis,DCM
6Myocarditis
7Valvular dz
8 
9MI,LVH
10DCM
11

HCM,Fibrosis,Valvular dz

 

Since each person can have more than one diagnosis (max 6 diagnoses per person, 13 diagnoses possible) I would like to create a dummy variable for each of the 13 diagnosis, (labeled as "_MR" aka the source of the diagnosis, to distinguish from other diagnosis variables).

 

Not knowing how to separate the comma delimited variables, I separated them in Excel as DX1-DX6. After importing the new data,

I did the following:

 

 




select (DX1); when ('Amyloidosis') Amyloid_MR =1; when ('Apical HCM') Ap_HCM_MR =1; when ('CM') CM_MR =1; when ('DCM') DCM_MR =1; when ('Fibrosis') Fibrosis_MR =1; when ('HCM') HCM_MR =1; when ('LVH') LVH_MR =1; when ('MI') MI_MR =1; when ('Myocarditis') Myocarditis_MR =1; when ('Noncompaction') Noncompaction_MR =1; when ('Pericardial dz') Pericardial_MR =1; when ('Sarcoidosis') Sarcoidosis_MR =1; when ('Valvular dz') Valvular_MR =1; otherwise; end;

1. Is there a better way to accomplish this for all DX vars with a do-loop (or macro) to execute the above code without copy-pasting 5 additional times for each DX?

 

2. How would I separate the comma delimited data?

3. The above code leaves you with variables like "Valvular_MR" where observations are either "1" or " ". How can I execute this code while also replacing the missing rows with "0" as appropriate?

 

Thank you!

3 REPLIES 3
Reeza
Super User

1. Use arrays instead of select

https://stats.idre.ucla.edu/sas/seminars/sas-arrays/

 

2. to separate diagnosis use SCAN + COUNTW()

 

Import data as is. Separate in SAS and assign to variables. 

 

Untested but this should work - once you fix the coding bugs I likely have in there.

 

data want;
set have;

array diag(*) diag1-diag13;
array dName(*) _temporary_ $ ("Myocarditis", "Valvular dz");

nwords=countw(diagnosis);

do i=1 to nwords;
word = scan(diagnosis, i);
x=whichn(word, of dname(*));
diag(x) = 1;
end;

run;

@webster1002 wrote:

I have a few questions about comma delimited data, do-loops and dummy variables.

(SAS 9.4)

I have a data set that includes multiple diagnoses in one column separated by commas:

 

Obs CMRDiagnosis
1 Apical HCM,Fibrosis
2 Sarcoidosis,DCM
3 Amyloidosis
4 MI,Fibrosis
5 Myocarditis,DCM
6 Myocarditis
7 Valvular dz
8  
9 MI,LVH
10 DCM
11

HCM,Fibrosis,Valvular dz

 

Since each person can have more than one diagnosis (max 6 diagnoses per person, 13 diagnoses possible) I would like to create a dummy variable for each of the 13 diagnosis, (labeled as "_MR" aka the source of the diagnosis, to distinguish from other diagnosis variables).

 

Not knowing how to separate the comma delimited variables, I separated them in Excel as DX1-DX6. After importing the new data,

I did the following:

 

 




select (DX1); when ('Amyloidosis') Amyloid_MR =1; when ('Apical HCM') Ap_HCM_MR =1; when ('CM') CM_MR =1; when ('DCM') DCM_MR =1; when ('Fibrosis') Fibrosis_MR =1; when ('HCM') HCM_MR =1; when ('LVH') LVH_MR =1; when ('MI') MI_MR =1; when ('Myocarditis') Myocarditis_MR =1; when ('Noncompaction') Noncompaction_MR =1; when ('Pericardial dz') Pericardial_MR =1; when ('Sarcoidosis') Sarcoidosis_MR =1; when ('Valvular dz') Valvular_MR =1; otherwise; end;

1. Is there a better way to accomplish this for all DX vars with a do-loop (or macro) to execute the above code without copy-pasting 5 additional times for each DX?

 

2. How would I separate the comma delimited data?

3. The above code leaves you with variables like "Valvular_MR" where observations are either "1" or " ". How can I execute this code while also replacing the missing rows with "0" as appropriate?

 

Thank you!


 

novinosrin
Tourmaline | Level 20


data have;
infile cards truncover;
input Obs	CMRDiagnosis $30.;
cards;
1	Apical HCM,Fibrosis
2	Sarcoidosis,DCM
3	Amyloidosis
4	MI,Fibrosis
5	Myocarditis,DCM
6	Myocarditis
7	Valvular dz
8	 
9	MI,LVH
10	DCM
11	HCM,Fibrosis,Valvular dz
;
data want;
set have;
array t(999)$30 _temporary_;
length want $20;
dummy=1;
do i=1 to countw(CMRDiagnosis,',');
if CMRDiagnosis>' ' then want=scan(CMRDiagnosis,i,',');
if want not in t then do;
output;
n+1;
t(n)=want;
end;
end;
drop CMRDiagnosis i n;;
rename want=CMRDiagnosis;
run;
PaigeMiller
Diamond | Level 26

Generally, the process of writing code to create dummy variables is not necessary in SAS, as most SAS procedures have built-in to them the ability to create dummy variables. In your case, you would have to separate the values on either side of the comma to be separate variables. Or, since you said you already have them separated in Excel, just read those into SAS and let the SAS PROC do its internal calculation of dummy variables.

--
Paige Miller

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 601 views
  • 4 likes
  • 4 in conversation