Remapping a string variable into an unknown number of new variables

Accepted Solution Solved
Reply
New Contributor
Posts: 4
Accepted Solution

Remapping a string variable into an unknown number of new variables

Hello,

 

I have a string variable called "illness" which contains a varying number of conditions separated by commas:

 

EX what I have:

Illness: "Heart Attack, Dementia, Cancer"

 

I need to create a new variable for each condition with each condition remapped to a new classification.

 

EX what I want:

Ilness1: "I20-I25 Ischaemic heart diseases" (i.e. Heart Attack)

Illness2: "F00-F09 Organic disorders"  (i.e. Dementia)

Illness3: "C00-D48 Neoplasms"  (i.e. Cancer)

 

I was thinking I could do a simple if then statement like the one below:


data x;
set x;
If illness contains "Heart Attack" then illness1="I20-I25 Ischaemic heart diseases";

If illness contains "Dementia" then illness2="F00-F09 Organic disorders";

If illness contains "Cancer" then illness3="C00-D48 Neoplasms";
.....

run;

 

However, this would be very time consuming seeing that the variable contains a varying number of conditions... Any suggestions for a more efficient way to achieve this?

 

Thanks for your insight!

 

 

 

 


Accepted Solutions
Solution
‎09-11-2017 09:18 AM
Respected Advisor
Posts: 4,687

Re: Remapping a string variable into an unknown number of new variables

Posted in reply to vgrywacheski

@vgrywacheski

And to give you some possible sample code for what @Reeza describes.

data have;
  id=1;
  have_str='Heart  Attack, Dementia, Cancer, something else'; 
run;

proc format;
  value $StandIll
    HEART ATTACK    ='I20-I25 Ischaemic heart diseases" (i.e. Heart Attack)'
    DEMENTIA        ='F00-F09 Organic disorders"  (i.e. Dementia)'
    CANCER          ='C00-D48 Neoplasms"  (i.e. Cancer)'
    ;
quit;

data want(drop=_:);
  set have;
  _cnt=countc(have_str,',');
  do _i=0 to _cnt;
    term=scan(have_str,_i+1,',');
    term=upcase(compbl(strip(term)));
    want_str=put(term,$StandIll.);
    found_flg = (want_str ne term);
    output;
  end;
run;

N.B: If you're after actually working sample code then the best way to motivate forum members to provide you with such code is to provide sample data via a working SAS data step creating such data. It's also highly appreciated if you describe what you've already tried and got stuck with. And of course as you've done it: Describe the desired result.

View solution in original post


All Replies
Super User
Posts: 23,323

Re: Remapping a string variable into an unknown number of new variables

Posted in reply to vgrywacheski

You can use the SCAN() function to separate words if they're comma delimited as in your example.

 

You can also output them out to a 'long format' data set and then do a merge or lookup to get the full codes. 

Use COUNTC to count the number of commas and see how many words you need to parse out.

Solution
‎09-11-2017 09:18 AM
Respected Advisor
Posts: 4,687

Re: Remapping a string variable into an unknown number of new variables

Posted in reply to vgrywacheski

@vgrywacheski

And to give you some possible sample code for what @Reeza describes.

data have;
  id=1;
  have_str='Heart  Attack, Dementia, Cancer, something else'; 
run;

proc format;
  value $StandIll
    HEART ATTACK    ='I20-I25 Ischaemic heart diseases" (i.e. Heart Attack)'
    DEMENTIA        ='F00-F09 Organic disorders"  (i.e. Dementia)'
    CANCER          ='C00-D48 Neoplasms"  (i.e. Cancer)'
    ;
quit;

data want(drop=_:);
  set have;
  _cnt=countc(have_str,',');
  do _i=0 to _cnt;
    term=scan(have_str,_i+1,',');
    term=upcase(compbl(strip(term)));
    want_str=put(term,$StandIll.);
    found_flg = (want_str ne term);
    output;
  end;
run;

N.B: If you're after actually working sample code then the best way to motivate forum members to provide you with such code is to provide sample data via a working SAS data step creating such data. It's also highly appreciated if you describe what you've already tried and got stuck with. And of course as you've done it: Describe the desired result.

New Contributor
Posts: 4

Re: Remapping a string variable into an unknown number of new variables

Thank you so much Patrick! This is exactly what I needed Smiley Happy

Esteemed Advisor
Posts: 5,482

Re: Remapping a string variable into an unknown number of new variables

[ Edited ]
Posted in reply to vgrywacheski

You could use an indexed dataset with key= access

 

 

 

 

data have;
patient = 1;
Illnesses = "Heart Attack, Dementia, Cancer";
output;
patient = 2;
Illnesses = "Heart Attack, Tumor, Booboo";
output;
run;

data d2(index = (illness/unique));
infile datalines dsd;
length label illness $32;
input label &  Illness &;
illness = upcase(illness);
datalines;
I20-I25 Ischaemic heart diseases, Heart Attack
F00-F09 Organic disorders, Dementia
C00-D48 Neoplasms, Cancer
C00-D48 Neoplasms, Tumor
;

data want;
set have;
length illness $32;
array a $32 illness1 - illness10;
do i = 1 to countw(Illnesses, ",");
    illness = upcase(scan(Illnesses, i, ",", "R"));
    set d2 key=illness;
    if _iorc_ = 0 then a{i} = label;
    else a{i} = "*** Unknown ***";
    end;
keep patient illness1-illness10;
run;

 

 

PG
Frequent Contributor
Posts: 113

Re: Remapping a string variable into an unknown number of new variables

Posted in reply to vgrywacheski

%let illness1=%str('HEART ATTACK','CARDIAC');

%let illness2=%str('DEMENTIA','AMNESIA');

%let illness3=%str('CANCER','CIRRHOSIS');

 

data  WANT;

  set HAVE;

  length ILLNESS1 ILLNESS2 ILLNESS3 $30;

  ILLNESS1=ifc(upcase(ILLNESS) in (&illness1),"I20-I25 Ischaemic heart diseases","");

  ILLNESS2=ifc(upcase(ILLNESS) in (&illness2),"F00-F09 Organic disorders","");

  ILLNESS3=ifc(upcase(ILLNESS) in (&illness3),"C00-D48 Neoplasms","");

run;

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 188 views
  • 2 likes
  • 5 in conversation