BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
vraj1
Quartz | Level 8

I have a variable with data and want to extract perticular data which starts with "The rate" and ends with "." but some data doesnt have "." and then in those cases it should extract till the end. I am not sure how to use that condition. Any help

 

data is like the below

inflix="If IEE 22 then... set datapoint value for IETEST to The rate has or has had one or more of the following conditions that:- neurological disorder";
inflix2="If IEE 12 then... set datapoint value for LIME to The rate has or has had one or more of the following condition has not been seen. this is what is not expected";

 

first case has no fullstop and second one has and only want data till not been seen.. 

 

I am using the below step

IETEST=substr(infix,index(infix,"The patient"),index(substr(infix,index(infix,"The rate")),'.'));

 

Any help how to make a condition that if it has "." then end the extraction else take the rest from The rate.

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
ChrisNZ
Tourmaline | Level 20

Like this?

 

data HAVE;
  PRX=prxparse('s/.*(The rate .*?(\.|$)).*/$1/o');
  IN1="If IEE 12 then... set datapoint value for LIME to The rate has or has had one or more of the following condition has not been seen. this is what is not expected"; 
  IN2="If IEE 22 then... set datapoint value for IETEST to The rate has or has had one or more of the following conditions that:- neurological disorder";
  OUT1=prxchange(PRX,1,IN1); put OUT1=;
  OUT2=prxchange(PRX,1,IN2); put OUT2=;
run; 

OUT1=The rate has or has had one or more of the following condition has not been seen.
OUT2=The rate has or has had one or more of the following conditions that:- neurological disorder

 

The regular expression reads as:

s                             substitution requested

/                              start of match pattern

.*                             any number of characters

(The rate .*?(\.|$))  then a group made of: The rate

                                                            then space                                                             

                                                .*?        then any number of characters    (but don't try to match all of them)

                                               (\.|$)      then either a dot or the end of the string

.*   any number of characters

/    end of match pattern, start of replace pattern

$1 replace match with first group

/    end of replace pattern

o   only compile once

 

Regular expressions look complex but are worth learning when handling strings, as they they are so powerful

 

View solution in original post

10 REPLIES 10
ChrisNZ
Tourmaline | Level 20

Like this?

 

data HAVE;
  PRX=prxparse('s/.*(The rate .*?(\.|$)).*/$1/o');
  IN1="If IEE 12 then... set datapoint value for LIME to The rate has or has had one or more of the following condition has not been seen. this is what is not expected"; 
  IN2="If IEE 22 then... set datapoint value for IETEST to The rate has or has had one or more of the following conditions that:- neurological disorder";
  OUT1=prxchange(PRX,1,IN1); put OUT1=;
  OUT2=prxchange(PRX,1,IN2); put OUT2=;
run; 

OUT1=The rate has or has had one or more of the following condition has not been seen.
OUT2=The rate has or has had one or more of the following conditions that:- neurological disorder

 

The regular expression reads as:

s                             substitution requested

/                              start of match pattern

.*                             any number of characters

(The rate .*?(\.|$))  then a group made of: The rate

                                                            then space                                                             

                                                .*?        then any number of characters    (but don't try to match all of them)

                                               (\.|$)      then either a dot or the end of the string

.*   any number of characters

/    end of match pattern, start of replace pattern

$1 replace match with first group

/    end of replace pattern

o   only compile once

 

Regular expressions look complex but are worth learning when handling strings, as they they are so powerful

 

vraj1
Quartz | Level 8

My data for both are in one variable so i need to have condition which needs to check if it has "." then end or else continue till end

Patrick
Opal | Level 21

@vraj1 wrote:

My data for both are in one variable so i need to have condition which needs to check if it has "." then end or else continue till end


Isn't this exactly what @ChrisNZ's code does using your sample data? If not then please show us how the desired result using your sample data should look like.

vraj1
Quartz | Level 8

It doesnt work if i have special characters like below in the string

If  IsEqualTo Exclusion 15 then... set datapoint value for IETEST in Exclusion is at significant risk of harming himself/herself or others according to the investigator’s judgement, or who answers on the C-S:
- ”Yes” to questions 4 or 5 on the  section within the last 3 months, OR
- “Yes” to any question on the  section within the last 3 months , OR
- “Yes” to questions 4 or 5 on the on the  section at Base
, and execute the "Return True" custom function

Patrick
Opal | Level 21

@vraj1

To avoid a lot of back and forth what would really help is if you would provide us with some sample data (in the form of a SAS data step).

Try to create some data which contains all the challenges/variations to be solved but keep the data also to the minimum required.

Something like below as a starting point - just expand on it.

 

data HAVE;
  length string @256;
  string="AA BB... word1 word2. word3"; output;
  string="AA BB... word1 word2 word3"; output;
run; 

 

Then explain us the logic you need to create the desired result AND show us the desired result based on the sample data posted.

 

vraj1
Quartz | Level 8

SOme of the data field has so by using the code i can get the first string but second one fails i.e it gives from the start not from "The rate"

data HAVE;
  length string @256;
  string="If IE in Criteria IsEqualTo 1  then... set datapoint value for The rate in Criteria to has previously been enrolled in this., and execute the "Return True" custom function"; output;
  string="If IE in Criteria IsEqualTo Exclusion 15 then... set datapoint value for in Criteria to The rate is at significant risk of harming himself/herself or others according to the investigator’s judgement, or who answers on the C-SSRS:
- ”Yes” to questions 4 or 5 on the section within the last 3 months , OR
- “Yes” to any question on the section within the last 3 months , OR
- “Yes” to questions 4 or 5 on the on the section
, and execute the "Return True" custom function"
; output; run;

 so in second string if i do not have "." then i need it till end

ChrisNZ
Tourmaline | Level 20

No toot sure why you say it's failing. It's working as expected here.

data HAVE;
  length OUT1 OUT2 $800;
  PRX=prxparse('s/.*(The rate .*?(\.|$)).*/$1/o');
  IN1='If IE in Criteria IsEqualTo Exclusion 15 then... set datapoint value for in Criteria to The rate is at significant risk of harming himself/herself or others according to the investigator’s judgement, or who answers on the C-SSRS:- ”Yes” to questions 4 or 5 on the section within the last 3 months , OR- “Yes” to any question on the section within the last 3 months , OR- “Yes” to questions 4 or 5 on the on the section, and execute the "Return True" custom function'; 
  IN2='If IE in Criteria IsEqualTo 1  then... set datapoint value for The rate in Criteria to has previously been enrolled in this., and execute the "Return True" custom function'; 
  OUT1=prxchange(PRX,1,IN1); put OUT1=;
  OUT2=prxchange(PRX,1,IN2); put OUT2=;
run;

OUT1=The rate is at significant risk of harming himself/herself or others according to the investigator’s judgement,
or who answers on the C-SSRS:- ”Yes” to questions 4 or 5 on the section within the last 3 months , OR- “Yes” to any
question on the section within the last 3 months , OR- “Yes” to questions 4 or 5 on the on the section, and execute
the "Return True" custom function
OUT2=The rate in Criteria to has previously been enrolled in this.

 

Patrick
Opal | Level 21

@vraj1 wrote:

SOme of the data field has so by using the code i can get the first string but second one fails i.e it gives from the start not from "The rate"

data HAVE;
  length string @256;
  string="If IE in Criteria IsEqualTo 1  then... set datapoint value for The rate in Criteria to has previously been enrolled in this., and execute the "Return True" custom function"; output;
  string="If IE in Criteria IsEqualTo Exclusion 15 then... set datapoint value for in Criteria to The rate is at significant risk of harming himself/herself or others according to the investigator’s judgement, or who answers on the C-SSRS:
- ”Yes” to questions 4 or 5 on the section within the last 3 months , OR
- “Yes” to any question on the section within the last 3 months , OR
- “Yes” to questions 4 or 5 on the on the section
, and execute the "Return True" custom function"
; output; run;

 so in second string if i do not have "." then i need it till end


@vraj1

Then explain us the logic you need to create the desired result AND show us the desired result based on the sample data posted.

vraj1
Quartz | Level 8

sorry for not being clear.

In this data

data HAVE;
  length string @256;
  string="If IE in Criteria IsEqualTo 1  then... set datapoint value for The rate in Criteria to has previously been enrolled in this., and execute the "Return True" custom function"; output;
  string="If IE in Criteria IsEqualTo Exclusion 15 then... set datapoint value for in Criteria to The rate is at significant risk of harming himself/herself or others according to the investigator’s judgement, or who answers on the C-SSRS:- ”Yes” to questions 4 or 5 on the section within the last 3 months , OR- “Yes” to any question on the section within the last 3 months , OR- “Yes” to questions 4 or 5 on the on the section, and execute the "Return True" custom function"; output;
run;

string1=”The rate in Criteria to has previously been enrolled in this.”

String2=”The rate is at significant risk of harming himself/herself or others according to the investigator’s judgement, or who answers on the C-SSRS:- ”Yes” to questions 4 or 5 on the section within the last 3 months , OR- “Yes” to any question on the section within the last 3 months , OR- “Yes” to questions 4 or 5 on the on the section”

 

so in string2 there is no "." so i need to end with before ", and execute "

ChrisNZ
Tourmaline | Level 20

If you start adding such variations, you are better off with something like:

  STARTPOS = find(STR, 'The rate');
  ENDPOS1  = find(STR, '.', STARTPOS);
  ENDPOS2  = find(STR, ', and execute ', STARTPOS) -1 ;
  NEWSTR   = substr(STR, STARTPOS, min(ENDPOS1, ENDPOS2, length(STR)) - STARTPOS);  

Please alter to suit. 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 3167 views
  • 1 like
  • 3 in conversation