Hello,
I have a string variable called "illness" which contains a varying number of conditions separated by commas:
EX what I have:
Illness: "Heart Attack, Dementia, Cancer"
I need to create a new variable for each condition with each condition remapped to a new classification.
EX what I want:
Ilness1: "I20-I25 Ischaemic heart diseases" (i.e. Heart Attack)
Illness2: "F00-F09 Organic disorders" (i.e. Dementia)
Illness3: "C00-D48 Neoplasms" (i.e. Cancer)
I was thinking I could do a simple if then statement like the one below:
data x;
set x;
If illness contains "Heart Attack" then illness1="I20-I25 Ischaemic heart diseases";
If illness contains "Dementia" then illness2="F00-F09 Organic disorders";
If illness contains "Cancer" then illness3="C00-D48 Neoplasms";
.....
run;
However, this would be very time consuming seeing that the variable contains a varying number of conditions... Any suggestions for a more efficient way to achieve this?
Thanks for your insight!
And to give you some possible sample code for what @Reeza describes.
data have;
id=1;
have_str='Heart Attack, Dementia, Cancer, something else';
run;
proc format;
value $StandIll
HEART ATTACK ='I20-I25 Ischaemic heart diseases" (i.e. Heart Attack)'
DEMENTIA ='F00-F09 Organic disorders" (i.e. Dementia)'
CANCER ='C00-D48 Neoplasms" (i.e. Cancer)'
;
quit;
data want(drop=_:);
set have;
_cnt=countc(have_str,',');
do _i=0 to _cnt;
term=scan(have_str,_i+1,',');
term=upcase(compbl(strip(term)));
want_str=put(term,$StandIll.);
found_flg = (want_str ne term);
output;
end;
run;
N.B: If you're after actually working sample code then the best way to motivate forum members to provide you with such code is to provide sample data via a working SAS data step creating such data. It's also highly appreciated if you describe what you've already tried and got stuck with. And of course as you've done it: Describe the desired result.
You can use the SCAN() function to separate words if they're comma delimited as in your example.
You can also output them out to a 'long format' data set and then do a merge or lookup to get the full codes.
Use COUNTC to count the number of commas and see how many words you need to parse out.
And to give you some possible sample code for what @Reeza describes.
data have;
id=1;
have_str='Heart Attack, Dementia, Cancer, something else';
run;
proc format;
value $StandIll
HEART ATTACK ='I20-I25 Ischaemic heart diseases" (i.e. Heart Attack)'
DEMENTIA ='F00-F09 Organic disorders" (i.e. Dementia)'
CANCER ='C00-D48 Neoplasms" (i.e. Cancer)'
;
quit;
data want(drop=_:);
set have;
_cnt=countc(have_str,',');
do _i=0 to _cnt;
term=scan(have_str,_i+1,',');
term=upcase(compbl(strip(term)));
want_str=put(term,$StandIll.);
found_flg = (want_str ne term);
output;
end;
run;
N.B: If you're after actually working sample code then the best way to motivate forum members to provide you with such code is to provide sample data via a working SAS data step creating such data. It's also highly appreciated if you describe what you've already tried and got stuck with. And of course as you've done it: Describe the desired result.
Thank you so much Patrick! This is exactly what I needed 🙂
You could use an indexed dataset with key= access
data have;
patient = 1;
Illnesses = "Heart Attack, Dementia, Cancer";
output;
patient = 2;
Illnesses = "Heart Attack, Tumor, Booboo";
output;
run;
data d2(index = (illness/unique));
infile datalines dsd;
length label illness $32;
input label & Illness &;
illness = upcase(illness);
datalines;
I20-I25 Ischaemic heart diseases, Heart Attack
F00-F09 Organic disorders, Dementia
C00-D48 Neoplasms, Cancer
C00-D48 Neoplasms, Tumor
;
data want;
set have;
length illness $32;
array a $32 illness1 - illness10;
do i = 1 to countw(Illnesses, ",");
illness = upcase(scan(Illnesses, i, ",", "R"));
set d2 key=illness;
if _iorc_ = 0 then a{i} = label;
else a{i} = "*** Unknown ***";
end;
keep patient illness1-illness10;
run;
%let illness1=%str('HEART ATTACK','CARDIAC');
%let illness2=%str('DEMENTIA','AMNESIA');
%let illness3=%str('CANCER','CIRRHOSIS');
data WANT;
set HAVE;
length ILLNESS1 ILLNESS2 ILLNESS3 $30;
ILLNESS1=ifc(upcase(ILLNESS) in (&illness1),"I20-I25 Ischaemic heart diseases","");
ILLNESS2=ifc(upcase(ILLNESS) in (&illness2),"F00-F09 Organic disorders","");
ILLNESS3=ifc(upcase(ILLNESS) in (&illness3),"C00-D48 Neoplasms","");
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.