SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Extracting substring from middle of string that's always before a certain expression (e.g., 'vs')

Accepted Solution Solved
Reply
Contributor
Posts: 67
Accepted Solution

Extracting substring from middle of string that's always before a certain expression (e.g., 'vs')

[ Edited ]

Hello,

 

 

I have used proc surveylogistic and ods output to extract information from my analysis so that I may manipulate my output into a nice, clean dataset to print out. However, I'm having trouble extracting string.

 

I have something like this *see the picture attached called "effect.JPG"*

 

 

I want to split that column up into 2, where I have it look like the 2 columns in "effect WANT.JPG"*


How can I do this?  I need a combination of substr, index, COMPBL, scan etc. functions and I'm not sure. I was able to get the first column, call it "variable," by doing this:

 

 

spaceplace = index(effect, ' ');
variable= substr(effect,1,spaceplace);

 

However, I need help getting the class level name (labeled as "Level of CLASS Variable for 1 Variable" in the effect2 WANT.JPG file). Basically, what I need to do is to extract the string that comes after the first word, first spaces, but is before the "[trailing blanks] vs [string]"

 

How can I tell sas to give me the word that comes right before the "vs" string? I tried to write out a code for you guys to play around with, but it is not working 100% for me Smiley Sad But maybe you can modify this?

 

data one;
input name $1-100;
cards;
'race6            1 Am. Indian   vs 7 White'
'race6            2 Asian        vs 7 White'
'marital2         1 Single                     vs 4 Married'
'lengthdeployment 1 Less than 1 month vs 6 I did not deploy in the past 12 months'
;

 

data two;
set one;
spaceplace = index(effect, ' ');
variable = substr(effect,1,spaceplace);

*The above two lines gave me my variable column and it worked in my main code. Not sure if it'll work here though;

 

 

*This is my closest attempt to getting it. but it is still wrong and gives me the vs [text] for some of them:;

vsplace = index(effect, ' vs ');

CLASSNAME3 = COMPBL(strip(substr(effect, spaceplace, vsplace-5)));

run;

proc print;
run;

 

So in my case, I'd want to have classname3 look like this

 

1 Am. Indian
2 Asian
1 Single
1 Less than 1 month 

 

I think it'd just be easier to look at my "effect2 want.jpg" to see what I am trying to achieve.

 

Many thanks for your help. Yes, I have looked through the message boards for similar searches but am still having trouble. I hope you're having a nice 4th of July!

Best,
Gina

 


effect.JPGeffect2 WANT.JPG

Accepted Solutions
Solution
‎07-05-2017 01:09 AM
PROC Star
Posts: 1,167

Re: Extracting substring from middle of string that's always before a certain expression (e.g., 'vs'

I love playing with character functions!

 

See if this moves the yardsticks for you:

 

Tom

 

data one;

infile cards dlm='09'x;

length name $100;

input name;

cards;

race6 1 Am. Indian vs 7 White

race6 2 Asian vs 7 White

marital2 1 Single vs 4 Married

lengthdeployment 1 Less than 1 month vs 6 I did not deploy in the past 12 months

run;

data want(drop=_Smiley Happy;

length variable classname3 _RemainderOfString $100;

set one;

variable = scan(name, 1, " "); /* Get the first word, blank delimited */

_FirstBlank = find(name, " "); /* Find the first blank */

_RemainderOfString = strip(substr(name, _FirstBlank)); /* Get everything after the first word, and strip out leading blanks */

_RemainderOfString = tranwrd(_RemainderOfString, ' vs ', '09'x); /* Turn all instances of blank followed by "vs" followed by blank into a tab character */

classname3 = scan(_RemainderOfString, 1, '09'x); /* Get the first work, tab delimited */

run;

View solution in original post


All Replies
PROC Star
Posts: 7,474

Re: Extracting substring from middle of string that's always before a certain expression (e.g., 'vs'

Here is one way:

data one; 
  infile cards truncover;
input name $1-100; 
cards;
race6            1 Am. Indian   vs 7 White
race6            2 Asian        vs 7 White
marital2         1 Single                     vs 4 Married
;

data want (drop=name x);;
  set one;
  variable=scan(name,1);
  name=left(substr(name,length(variable)+1));
  x=index(name,'vs');
  level=strip(substr(name,1,x-1));
  output;
  level=substr(name,x+2);
  output;
run;

Art, CEO, AnalystFinder.com

Solution
‎07-05-2017 01:09 AM
PROC Star
Posts: 1,167

Re: Extracting substring from middle of string that's always before a certain expression (e.g., 'vs'

I love playing with character functions!

 

See if this moves the yardsticks for you:

 

Tom

 

data one;

infile cards dlm='09'x;

length name $100;

input name;

cards;

race6 1 Am. Indian vs 7 White

race6 2 Asian vs 7 White

marital2 1 Single vs 4 Married

lengthdeployment 1 Less than 1 month vs 6 I did not deploy in the past 12 months

run;

data want(drop=_Smiley Happy;

length variable classname3 _RemainderOfString $100;

set one;

variable = scan(name, 1, " "); /* Get the first word, blank delimited */

_FirstBlank = find(name, " "); /* Find the first blank */

_RemainderOfString = strip(substr(name, _FirstBlank)); /* Get everything after the first word, and strip out leading blanks */

_RemainderOfString = tranwrd(_RemainderOfString, ' vs ', '09'x); /* Turn all instances of blank followed by "vs" followed by blank into a tab character */

classname3 = scan(_RemainderOfString, 1, '09'x); /* Get the first work, tab delimited */

run;

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 2 replies
  • 663 views
  • 2 likes
  • 3 in conversation