BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
chapidi99
Fluorite | Level 6

Hello, I have below (SpecialServices) column in SAS dataset, I would like to extract string next to "name" particularly and create column for each values: Example, SpecialService1, SpecialService2 etc,. 

 

SAS Program:

 

data Cln_SpecialService (drop= _i);
 set Sub_Str_Qualification;
   length SpecialServices1-SpecialServices20 $306;
     array SpecialServices(20) $;
   do _i = 1 to dim(SpecialServices);
   SpecialServices[_i] = scan(SpecialService,_i,',');
  end;
run;

 

Raw Data:

 

- 'id' 'CXX2', 'name' 'Multimedikation' , 'id' 'CXXB', 'name' 'Asthma / COPD' , 'id' 'CXX9', 'name' 'Diabetes Typ II' , 'id' 'CXX8', 'name' 'Hautsscreening' , 'id' 'CXX7', 'name' 'KHK'
- 'id' 'CXX9', 'name' 'Diabetes Typ II' , 'id' 'CXX8', 'name' 'Hautkrebsscreening' , 'id' 'CXX7', 'name' 'KHK'
- 'id' 'CXX2', 'name' 'Multimedikation' , 'id' 'CXXB', 'name' 'Asthma / COPD' , 'id' 'CXX9', 'name' 'Diabetes Typ II' , 'id' 'CXX8', 'name' 'Hautsscreening' , 'id' 'CXX7', 'name' 'KHK'
- 'id' 'CXXB', 'name' 'Asthma / COPD' , 'id' 'CXX9', 'name' 'Diabetes Typ II' , 'id' 'CXX7', 'name' 'KHK'
- 'id' 'CXXB', 'name' 'Asthma / COPD' , 'id' 'CXX9', 'name' 'Diabetes Typ II' , 'id' 'CXX7', 'name' 'KHK'
- 'id' 'CA2B', 'name' 'Asthma / COPD' , 'id' 'CXX9', 'name' 'Diabetes Typ II' , 'id' 'CXX8', 'name' 'Hautsscreening' , 'id' 'CXX7', 'name' 'KHK'
'id' 'CX39', 'name' 'Diabetes Typ II' , 'id' 'CXX3', 'name' 'Psychotherapie, tiefenpsychologisch - Einzelth. - Erw.' , 'id' 'CXXE', 'name' 'Verhaltenstherapie - Einzelth. - Erw.'

 

Desired Result:

 

SpecialService1, SpecialService2, SpecialService3, SpecialService4, SpecialService5

'Multimedikation' | 'Asthma / COPD' |  'Diabetes Typ II' | 'Hautsscreening' | 'KHK'

'Diabetes Typ II' |  'Hautkrebsscreening'  |  'KHK'

 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

What is the actual name of your variable? You cannot have an Array name the same as a Variable. So just how many errors did the code you show throw when attempted?

 

 

This appears to work with your example "data". First data step makes a data set with a variable holding that long string.

data have;
   length var $ 306;
   input ;
   var=_infile_;
datalines;
- 'id' 'CXX2', 'name' 'Multimedikation' , 'id' 'CXXB', 'name' 'Asthma / COPD' , 'id' 'CXX9', 'name' 'Diabetes Typ II' , 'id' 'CXX8', 'name' 'Hautsscreening' , 'id' 'CXX7', 'name' 'KHK'
- 'id' 'CXX9', 'name' 'Diabetes Typ II' , 'id' 'CXX8', 'name' 'Hautkrebsscreening' , 'id' 'CXX7', 'name' 'KHK'
- 'id' 'CXX2', 'name' 'Multimedikation' , 'id' 'CXXB', 'name' 'Asthma / COPD' , 'id' 'CXX9', 'name' 'Diabetes Typ II' , 'id' 'CXX8', 'name' 'Hautsscreening' , 'id' 'CXX7', 'name' 'KHK'
- 'id' 'CXXB', 'name' 'Asthma / COPD' , 'id' 'CXX9', 'name' 'Diabetes Typ II' , 'id' 'CXX7', 'name' 'KHK'
- 'id' 'CXXB', 'name' 'Asthma / COPD' , 'id' 'CXX9', 'name' 'Diabetes Typ II' , 'id' 'CXX7', 'name' 'KHK'
- 'id' 'CA2B', 'name' 'Asthma / COPD' , 'id' 'CXX9', 'name' 'Diabetes Typ II' , 'id' 'CXX8', 'name' 'Hautsscreening' , 'id' 'CXX7', 'name' 'KHK'
- 'id' 'CX39', 'name' 'Diabetes Typ II' , 'id' 'CXX3', 'name' 'Psychotherapie, tiefenpsychologisch - Einzelth. - Erw.' , 'id' 'CXXE', 'name' 'Verhaltenstherapie - Einzelth. - Erw.'
;

data want;
   set have;
   length temp $ 100;
   array SpecialServices(20) $ 100;
   tcount=1;
   do i=1 to countw(var,',');
      temp=scan(var,i,',');
      if indexw(temp,"'name'")>0 then do;
         SpecialServices[tcount]= substr(temp,8);
         tcount=tcount+1;
      end;
   end;
   drop i tcount temp;
run;

You Scan wasn't working because there were too many things delimited by commas. So this pulls out chunks and sees if 'name' is part of chunk, then gets the value when present. Assumes 'name' will always have a preceding space and only one before the value. 

Note the assignment of the length of the variables in the Array statement. Keeping as much stuff on a single statement makes the code a tad easier to modify latter if needed. You could add an element to the array but forget to update a separate Length statement possibly resulting in truncated data.

A simple modification would also work to extract the ID

 

View solution in original post

3 REPLIES 3
ballardw
Super User

What is the actual name of your variable? You cannot have an Array name the same as a Variable. So just how many errors did the code you show throw when attempted?

 

 

This appears to work with your example "data". First data step makes a data set with a variable holding that long string.

data have;
   length var $ 306;
   input ;
   var=_infile_;
datalines;
- 'id' 'CXX2', 'name' 'Multimedikation' , 'id' 'CXXB', 'name' 'Asthma / COPD' , 'id' 'CXX9', 'name' 'Diabetes Typ II' , 'id' 'CXX8', 'name' 'Hautsscreening' , 'id' 'CXX7', 'name' 'KHK'
- 'id' 'CXX9', 'name' 'Diabetes Typ II' , 'id' 'CXX8', 'name' 'Hautkrebsscreening' , 'id' 'CXX7', 'name' 'KHK'
- 'id' 'CXX2', 'name' 'Multimedikation' , 'id' 'CXXB', 'name' 'Asthma / COPD' , 'id' 'CXX9', 'name' 'Diabetes Typ II' , 'id' 'CXX8', 'name' 'Hautsscreening' , 'id' 'CXX7', 'name' 'KHK'
- 'id' 'CXXB', 'name' 'Asthma / COPD' , 'id' 'CXX9', 'name' 'Diabetes Typ II' , 'id' 'CXX7', 'name' 'KHK'
- 'id' 'CXXB', 'name' 'Asthma / COPD' , 'id' 'CXX9', 'name' 'Diabetes Typ II' , 'id' 'CXX7', 'name' 'KHK'
- 'id' 'CA2B', 'name' 'Asthma / COPD' , 'id' 'CXX9', 'name' 'Diabetes Typ II' , 'id' 'CXX8', 'name' 'Hautsscreening' , 'id' 'CXX7', 'name' 'KHK'
- 'id' 'CX39', 'name' 'Diabetes Typ II' , 'id' 'CXX3', 'name' 'Psychotherapie, tiefenpsychologisch - Einzelth. - Erw.' , 'id' 'CXXE', 'name' 'Verhaltenstherapie - Einzelth. - Erw.'
;

data want;
   set have;
   length temp $ 100;
   array SpecialServices(20) $ 100;
   tcount=1;
   do i=1 to countw(var,',');
      temp=scan(var,i,',');
      if indexw(temp,"'name'")>0 then do;
         SpecialServices[tcount]= substr(temp,8);
         tcount=tcount+1;
      end;
   end;
   drop i tcount temp;
run;

You Scan wasn't working because there were too many things delimited by commas. So this pulls out chunks and sees if 'name' is part of chunk, then gets the value when present. Assumes 'name' will always have a preceding space and only one before the value. 

Note the assignment of the length of the variables in the Array statement. Keeping as much stuff on a single statement makes the code a tad easier to modify latter if needed. You could add an element to the array but forget to update a separate Length statement possibly resulting in truncated data.

A simple modification would also work to extract the ID

 

chapidi99
Fluorite | Level 6
Thank you for the quick solution!! Could you please explain the parts of your program , I didn't understand better at if condition especially. Your code works perfect!!
ballardw
Super User

@chapidi99 wrote:
Thank you for the quick solution!! Could you please explain the parts of your program , I didn't understand better at if condition especially. Your code works perfect!!

The Indexw function returns the position of the second parameter, in this case 'name' in the first parameter, the variable (other options as well). So if the value is found it returns a number and 0 if not found. So that tells us if 'name' is in the current piece of the long string.

Then we extract the portion of the value after name as requested with Substr as your shown output included the quotes as part of the values. If you do not want or need the quotes as part of the value use Scan(temp,4,"'") [a single quote between two double quotes as that is hard to read here].

Since we have now extracted a value we set the counter variable to +1 so that the next value, if found, will place the result into the following member of the array.

 

If there are other parts you don't understand please ask specific questions. You can copy lines of the code and ask about them.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 802 views
  • 0 likes
  • 2 in conversation