I have a variable with data and want to extract perticular data which starts with "The rate" and ends with "." but some data doesnt have "." and then in those cases it should extract till the end. I am not sure how to use that condition. Any help
data is like the below
inflix="If IEE 22 then... set datapoint value for IETEST to The rate has or has had one or more of the following conditions that:- neurological disorder";
 inflix2="If IEE 12 then... set datapoint value for LIME to The rate has or has had one or more of the following condition has not been seen. this is what is not expected";
first case has no fullstop and second one has and only want data till not been seen..
I am using the below step
IETEST=substr(infix,index(infix,"The patient"),index(substr(infix,index(infix,"The rate")),'.'));
Any help how to make a condition that if it has "." then end the extraction else take the rest from The rate.
Like this?
data HAVE;
  PRX=prxparse('s/.*(The rate .*?(\.|$)).*/$1/o');
  IN1="If IEE 12 then... set datapoint value for LIME to The rate has or has had one or more of the following condition has not been seen. this is what is not expected"; 
  IN2="If IEE 22 then... set datapoint value for IETEST to The rate has or has had one or more of the following conditions that:- neurological disorder";
  OUT1=prxchange(PRX,1,IN1); put OUT1=;
  OUT2=prxchange(PRX,1,IN2); put OUT2=;
run; OUT1=The rate has or has had one or more of the following condition has not been seen.
OUT2=The rate has or has had one or more of the following conditions that:- neurological disorder
The regular expression reads as:
s substitution requested
/ start of match pattern
.* any number of characters
(The rate .*?(\.|$)) then a group made of: The rate
then space
.*? then any number of characters (but don't try to match all of them)
(\.|$) then either a dot or the end of the string
.* any number of characters
/ end of match pattern, start of replace pattern
$1 replace match with first group
/ end of replace pattern
o only compile once
Regular expressions look complex but are worth learning when handling strings, as they they are so powerful
Like this?
data HAVE;
  PRX=prxparse('s/.*(The rate .*?(\.|$)).*/$1/o');
  IN1="If IEE 12 then... set datapoint value for LIME to The rate has or has had one or more of the following condition has not been seen. this is what is not expected"; 
  IN2="If IEE 22 then... set datapoint value for IETEST to The rate has or has had one or more of the following conditions that:- neurological disorder";
  OUT1=prxchange(PRX,1,IN1); put OUT1=;
  OUT2=prxchange(PRX,1,IN2); put OUT2=;
run; OUT1=The rate has or has had one or more of the following condition has not been seen.
OUT2=The rate has or has had one or more of the following conditions that:- neurological disorder
The regular expression reads as:
s substitution requested
/ start of match pattern
.* any number of characters
(The rate .*?(\.|$)) then a group made of: The rate
then space
.*? then any number of characters (but don't try to match all of them)
(\.|$) then either a dot or the end of the string
.* any number of characters
/ end of match pattern, start of replace pattern
$1 replace match with first group
/ end of replace pattern
o only compile once
Regular expressions look complex but are worth learning when handling strings, as they they are so powerful
My data for both are in one variable so i need to have condition which needs to check if it has "." then end or else continue till end
@vraj1 wrote:
My data for both are in one variable so i need to have condition which needs to check if it has "." then end or else continue till end
Isn't this exactly what @ChrisNZ's code does using your sample data? If not then please show us how the desired result using your sample data should look like.
It doesnt work if i have special characters like below in the string
If  IsEqualTo Exclusion 15 then... set datapoint value for IETEST in Exclusion is at significant risk of harming himself/herself or others according to the investigator’s judgement, or who answers on the C-S:
- ”Yes” to questions 4 or 5 on the  section within the last 3 months, OR
- “Yes” to any question on the  section within the last 3 months , OR
- “Yes” to questions 4 or 5 on the on the  section at Base
, and execute the "Return True" custom function
To avoid a lot of back and forth what would really help is if you would provide us with some sample data (in the form of a SAS data step).
Try to create some data which contains all the challenges/variations to be solved but keep the data also to the minimum required.
Something like below as a starting point - just expand on it.
data HAVE;
  length string @256;
  string="AA BB... word1 word2. word3"; output;
  string="AA BB... word1 word2 word3"; output;
run; 
Then explain us the logic you need to create the desired result AND show us the desired result based on the sample data posted.
SOme of the data field has so by using the code i can get the first string but second one fails i.e it gives from the start not from "The rate"
data HAVE;
  length string @256;
  string="If IE in Criteria IsEqualTo 1  then... set datapoint value for The rate in Criteria to has previously been enrolled in this., and execute the "Return True" custom function"; output;
  string="If IE in Criteria IsEqualTo Exclusion 15 then... set datapoint value for in Criteria to The rate is at significant risk of harming himself/herself or others according to the investigator’s judgement, or who answers on the C-SSRS:
- ”Yes” to questions 4 or 5 on the section within the last 3 months , OR
- “Yes” to any question on the section within the last 3 months , OR
- “Yes” to questions 4 or 5 on the on the section
, and execute the "Return True" custom function"; output;
run; so in second string if i do not have "." then i need it till end
No toot sure why you say it's failing. It's working as expected here.
data HAVE;
  length OUT1 OUT2 $800;
  PRX=prxparse('s/.*(The rate .*?(\.|$)).*/$1/o');
  IN1='If IE in Criteria IsEqualTo Exclusion 15 then... set datapoint value for in Criteria to The rate is at significant risk of harming himself/herself or others according to the investigator’s judgement, or who answers on the C-SSRS:- ”Yes” to questions 4 or 5 on the section within the last 3 months , OR- “Yes” to any question on the section within the last 3 months , OR- “Yes” to questions 4 or 5 on the on the section, and execute the "Return True" custom function'; 
  IN2='If IE in Criteria IsEqualTo 1  then... set datapoint value for The rate in Criteria to has previously been enrolled in this., and execute the "Return True" custom function'; 
  OUT1=prxchange(PRX,1,IN1); put OUT1=;
  OUT2=prxchange(PRX,1,IN2); put OUT2=;
run;OUT1=The rate is at significant risk of harming himself/herself or others according to the investigator’s judgement,
 or who answers on the C-SSRS:- ”Yes” to questions 4 or 5 on the section within the last 3 months , OR- “Yes” to any
 question on the section within the last 3 months , OR- “Yes” to questions 4 or 5 on the on the section, and execute
 the "Return True" custom function
OUT2=The rate in Criteria to has previously been enrolled in this.
@vraj1 wrote:
SOme of the data field has so by using the code i can get the first string but second one fails i.e it gives from the start not from "The rate"
data HAVE; length string @256; string="If IE in Criteria IsEqualTo 1 then... set datapoint value for The rate in Criteria to has previously been enrolled in this., and execute the "Return True" custom function"; output; string="If IE in Criteria IsEqualTo Exclusion 15 then... set datapoint value for in Criteria to The rate is at significant risk of harming himself/herself or others according to the investigator’s judgement, or who answers on the C-SSRS:
- ”Yes” to questions 4 or 5 on the section within the last 3 months , OR
- “Yes” to any question on the section within the last 3 months , OR
- “Yes” to questions 4 or 5 on the on the section
, and execute the "Return True" custom function"; output; run;so in second string if i do not have "." then i need it till end
Then explain us the logic you need to create the desired result AND show us the desired result based on the sample data posted.
sorry for not being clear.
In this data
data HAVE;
  length string @256;
  string="If IE in Criteria IsEqualTo 1  then... set datapoint value for The rate in Criteria to has previously been enrolled in this., and execute the "Return True" custom function"; output;
  string="If IE in Criteria IsEqualTo Exclusion 15 then... set datapoint value for in Criteria to The rate is at significant risk of harming himself/herself or others according to the investigator’s judgement, or who answers on the C-SSRS:- ”Yes” to questions 4 or 5 on the section within the last 3 months , OR- “Yes” to any question on the section within the last 3 months , OR- “Yes” to questions 4 or 5 on the on the section, and execute the "Return True" custom function"; output;
run;string1=”The rate in Criteria to has previously been enrolled in this.”
String2=”The rate is at significant risk of harming himself/herself or others according to the investigator’s judgement, or who answers on the C-SSRS:- ”Yes” to questions 4 or 5 on the section within the last 3 months , OR- “Yes” to any question on the section within the last 3 months , OR- “Yes” to questions 4 or 5 on the on the section”
so in string2 there is no "." so i need to end with before ", and execute "
If you start adding such variations, you are better off with something like:
  STARTPOS = find(STR, 'The rate');
  ENDPOS1  = find(STR, '.', STARTPOS);
  ENDPOS2  = find(STR, ', and execute ', STARTPOS) -1 ;
  NEWSTR   = substr(STR, STARTPOS, min(ENDPOS1, ENDPOS2, length(STR)) - STARTPOS);  Please alter to suit.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
