DATA Step, Macro, Functions and more

Need to extract data from string with conditions

Accepted Solution Solved
Reply
Regular Contributor
Posts: 208
Accepted Solution

Need to extract data from string with conditions

I have a variable with data and want to extract perticular data which starts with "The rate" and ends with "." but some data doesnt have "." and then in those cases it should extract till the end. I am not sure how to use that condition. Any help

 

data is like the below

inflix="If IEE 22 then... set datapoint value for IETEST to The rate has or has had one or more of the following conditions that:- neurological disorder";
inflix2="If IEE 12 then... set datapoint value for LIME to The rate has or has had one or more of the following condition has not been seen. this is what is not expected";

 

first case has no fullstop and second one has and only want data till not been seen.. 

 

I am using the below step

IETEST=substr(infix,index(infix,"The patient"),index(substr(infix,index(infix,"The rate")),'.'));

 

Any help how to make a condition that if it has "." then end the extraction else take the rest from The rate.

 

 


Accepted Solutions
Solution
3 weeks ago
PROC Star
Posts: 2,311

Re: Need to extract data from string with conditions

Like this?

 

data HAVE;
  PRX=prxparse('s/.*(The rate .*?(\.|$)).*/$1/o');
  IN1="If IEE 12 then... set datapoint value for LIME to The rate has or has had one or more of the following condition has not been seen. this is what is not expected"; 
  IN2="If IEE 22 then... set datapoint value for IETEST to The rate has or has had one or more of the following conditions that:- neurological disorder";
  OUT1=prxchange(PRX,1,IN1); put OUT1=;
  OUT2=prxchange(PRX,1,IN2); put OUT2=;
run; 

OUT1=The rate has or has had one or more of the following condition has not been seen.
OUT2=The rate has or has had one or more of the following conditions that:- neurological disorder

 

The regular expression reads as:

s                             substitution requested

/                              start of match pattern

.*                             any number of characters

(The rate .*?(\.|$))  then a group made of: The rate

                                                            then space                                                             

                                                .*?        then any number of characters    (but don't try to match all of them)

                                               (\.|$)      then either a dot or the end of the string

.*   any number of characters

/    end of match pattern, start of replace pattern

$1 replace match with first group

/    end of replace pattern

o   only compile once

 

Regular expressions look complex but are worth learning when handling strings, as they they are so powerful

 

View solution in original post


All Replies
Solution
3 weeks ago
PROC Star
Posts: 2,311

Re: Need to extract data from string with conditions

Like this?

 

data HAVE;
  PRX=prxparse('s/.*(The rate .*?(\.|$)).*/$1/o');
  IN1="If IEE 12 then... set datapoint value for LIME to The rate has or has had one or more of the following condition has not been seen. this is what is not expected"; 
  IN2="If IEE 22 then... set datapoint value for IETEST to The rate has or has had one or more of the following conditions that:- neurological disorder";
  OUT1=prxchange(PRX,1,IN1); put OUT1=;
  OUT2=prxchange(PRX,1,IN2); put OUT2=;
run; 

OUT1=The rate has or has had one or more of the following condition has not been seen.
OUT2=The rate has or has had one or more of the following conditions that:- neurological disorder

 

The regular expression reads as:

s                             substitution requested

/                              start of match pattern

.*                             any number of characters

(The rate .*?(\.|$))  then a group made of: The rate

                                                            then space                                                             

                                                .*?        then any number of characters    (but don't try to match all of them)

                                               (\.|$)      then either a dot or the end of the string

.*   any number of characters

/    end of match pattern, start of replace pattern

$1 replace match with first group

/    end of replace pattern

o   only compile once

 

Regular expressions look complex but are worth learning when handling strings, as they they are so powerful

 

Regular Contributor
Posts: 208

Re: Need to extract data from string with conditions

My data for both are in one variable so i need to have condition which needs to check if it has "." then end or else continue till end

Respected Advisor
Posts: 4,668

Re: Need to extract data from string with conditions


@vraj1 wrote:

My data for both are in one variable so i need to have condition which needs to check if it has "." then end or else continue till end


Isn't this exactly what @ChrisNZ's code does using your sample data? If not then please show us how the desired result using your sample data should look like.

Regular Contributor
Posts: 208

Re: Need to extract data from string with conditions

It doesnt work if i have special characters like below in the string

If  IsEqualTo Exclusion 15 then... set datapoint value for IETEST in Exclusion is at significant risk of harming himself/herself or others according to the investigator’s judgement, or who answers on the C-S:
- ”Yes” to questions 4 or 5 on the  section within the last 3 months, OR
- “Yes” to any question on the  section within the last 3 months , OR
- “Yes” to questions 4 or 5 on the on the  section at Base
, and execute the "Return True" custom function

Respected Advisor
Posts: 4,668

Re: Need to extract data from string with conditions

[ Edited ]

@vraj1

To avoid a lot of back and forth what would really help is if you would provide us with some sample data (in the form of a SAS data step).

Try to create some data which contains all the challenges/variations to be solved but keep the data also to the minimum required.

Something like below as a starting point - just expand on it.

 

data HAVE;
  length string @256;
  string="AA BB... word1 word2. word3"; output;
  string="AA BB... word1 word2 word3"; output;
run; 

 

Then explain us the logic you need to create the desired result AND show us the desired result based on the sample data posted.

 

Regular Contributor
Posts: 208

Re: Need to extract data from string with conditions

[ Edited ]

SOme of the data field has so by using the code i can get the first string but second one fails i.e it gives from the start not from "The rate"

data HAVE;
  length string @256;
  string="If IE in Criteria IsEqualTo 1  then... set datapoint value for The rate in Criteria to has previously been enrolled in this., and execute the "Return True" custom function"; output;
  string="If IE in Criteria IsEqualTo Exclusion 15 then... set datapoint value for in Criteria to The rate is at significant risk of harming himself/herself or others according to the investigator’s judgement, or who answers on the C-SSRS:
- ”Yes” to questions 4 or 5 on the section within the last 3 months , OR
- “Yes” to any question on the section within the last 3 months , OR
- “Yes” to questions 4 or 5 on the on the section
, and execute the "Return True" custom function"
; output; run;

 so in second string if i do not have "." then i need it till end

PROC Star
Posts: 2,311

Re: Need to extract data from string with conditions

No toot sure why you say it's failing. It's working as expected here.

data HAVE;
  length OUT1 OUT2 $800;
  PRX=prxparse('s/.*(The rate .*?(\.|$)).*/$1/o');
  IN1='If IE in Criteria IsEqualTo Exclusion 15 then... set datapoint value for in Criteria to The rate is at significant risk of harming himself/herself or others according to the investigator’s judgement, or who answers on the C-SSRS:- ”Yes” to questions 4 or 5 on the section within the last 3 months , OR- “Yes” to any question on the section within the last 3 months , OR- “Yes” to questions 4 or 5 on the on the section, and execute the "Return True" custom function'; 
  IN2='If IE in Criteria IsEqualTo 1  then... set datapoint value for The rate in Criteria to has previously been enrolled in this., and execute the "Return True" custom function'; 
  OUT1=prxchange(PRX,1,IN1); put OUT1=;
  OUT2=prxchange(PRX,1,IN2); put OUT2=;
run;

OUT1=The rate is at significant risk of harming himself/herself or others according to the investigator’s judgement,
or who answers on the C-SSRS:- ”Yes” to questions 4 or 5 on the section within the last 3 months , OR- “Yes” to any
question on the section within the last 3 months , OR- “Yes” to questions 4 or 5 on the on the section, and execute
the "Return True" custom function
OUT2=The rate in Criteria to has previously been enrolled in this.

 

Respected Advisor
Posts: 4,668

Re: Need to extract data from string with conditions


@vraj1 wrote:

SOme of the data field has so by using the code i can get the first string but second one fails i.e it gives from the start not from "The rate"

data HAVE;
  length string @256;
  string="If IE in Criteria IsEqualTo 1  then... set datapoint value for The rate in Criteria to has previously been enrolled in this., and execute the "Return True" custom function"; output;
  string="If IE in Criteria IsEqualTo Exclusion 15 then... set datapoint value for in Criteria to The rate is at significant risk of harming himself/herself or others according to the investigator’s judgement, or who answers on the C-SSRS:
- ”Yes” to questions 4 or 5 on the section within the last 3 months , OR
- “Yes” to any question on the section within the last 3 months , OR
- “Yes” to questions 4 or 5 on the on the section
, and execute the "Return True" custom function"
; output; run;

 so in second string if i do not have "." then i need it till end


@vraj1

Then explain us the logic you need to create the desired result AND show us the desired result based on the sample data posted.

Regular Contributor
Posts: 208

Re: Need to extract data from string with conditions

sorry for not being clear.

In this data

data HAVE;
  length string @256;
  string="If IE in Criteria IsEqualTo 1  then... set datapoint value for The rate in Criteria to has previously been enrolled in this., and execute the "Return True" custom function"; output;
  string="If IE in Criteria IsEqualTo Exclusion 15 then... set datapoint value for in Criteria to The rate is at significant risk of harming himself/herself or others according to the investigator’s judgement, or who answers on the C-SSRS:- ”Yes” to questions 4 or 5 on the section within the last 3 months , OR- “Yes” to any question on the section within the last 3 months , OR- “Yes” to questions 4 or 5 on the on the section, and execute the "Return True" custom function"; output;
run;

string1=”The rate in Criteria to has previously been enrolled in this.”

String2=”The rate is at significant risk of harming himself/herself or others according to the investigator’s judgement, or who answers on the C-SSRS:- ”Yes” to questions 4 or 5 on the section within the last 3 months , OR- “Yes” to any question on the section within the last 3 months , OR- “Yes” to questions 4 or 5 on the on the section”

 

so in string2 there is no "." so i need to end with before ", and execute "

PROC Star
Posts: 2,311

Re: Need to extract data from string with conditions

If you start adding such variations, you are better off with something like:

  STARTPOS = find(STR, 'The rate');
  ENDPOS1  = find(STR, '.', STARTPOS);
  ENDPOS2  = find(STR, ', and execute ', STARTPOS) -1 ;
  NEWSTR   = substr(STR, STARTPOS, min(ENDPOS1, ENDPOS2, length(STR)) - STARTPOS);  

Please alter to suit. 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 10 replies
  • 136 views
  • 1 like
  • 3 in conversation