I have
data var;
input symptoms;
cards;
1. Over the past 1 months, I have coughed
2. Over the past 1 months, I have brought up phlegm (sputum):
3. Over the past 1 months, I have had shortness of breath:
4. Over the past 1 months, I have had attacks of wheezing:
5. During the past 1 months how many severe or very unpleasant attacks of chest trouble have you had?
6 How long did the worst attack of chest trouble last? (Go to question 7 if you had no severe attacks)
7. Over the past 1 months, in an average week, how many good days (with little chest trouble) have you had?
8. If you have a wheeze, is it worse in the morning?
;run;
i would like to extract the character part of the string by supressing 1. 2. 3. ......8. from the beginning of the string. it has to be removed in such a way that the 2nd or 3rd numeric value occurrences shouldn't be affected.
Any ideas how to solve this by using prxchange() would be very helpful .
You could take advantage of the first space and using substr from that position
data var;
infile cards truncover;
input symptoms $500.;
cards;
1. Over the past 1 months, I have coughed
2. Over the past 1 months, I have brought up phlegm (sputum):
3. Over the past 1 months, I have had shortness of breath:
4. Over the past 1 months, I have had attacks of wheezing:
5. During the past 1 months how many severe or very unpleasant attacks of chest trouble have you had?
6 How long did the worst attack of chest trouble last? (Go to question 7 if you had no severe attacks)
7. Over the past 1 months, in an average week, how many good days (with little chest trouble) have you had?
8. If you have a wheeze, is it worse in the morning?
;
run;
data want;
 set var;
 length want $500;
 want=substr(strip(symptoms),anyspace(symptoms)+1);
run;Hi @sahoositaram555 I like the fact you are trying to understand the code. Very good attitude.
Honestly I didn't give much thought as I wrote that as for me SAS is merely a game, sometimes I hit and miss.
Well, your understanding is correct. Basically
1. anyspace(string) determines the position of the first space after the number and dot. The strip makes sure and leading spaces are removed before anyspace function executes.
2. Once you have determined the position of the first blank space embedded in the string, the characters that follow the position of the first space ought to be the ones we need to extract
3. +1 is to move the pointer to the next immediate position after the first blank space in order to avoid extracting the blank space causing leading blank spaces, albeit should you still have blank spaces the str after starting point, those would fall in the result of the extracted string
4. I went with some assumptions like your string starts with numbers followed by dot and then a space and then follows the needed characters to extract
5. If those assumptions hold true, the solution is rather simple and easy
Just to "annoy" you.
data var;
infile cards truncover;
input symptoms $60.;
cards;
1. Over the past 1 months, I have coughed
Over the past 1 months, I have coughed
;
data want;
 set var;
 length want $500;
 want=substr(strip(symptoms),anyspace(symptoms)+1);
run;
proc print;
run;
Sir @Patrick Yes I noticed that in your solution example as you modified the sample in the last dataline. However, that's the assumption I went with. 😊
@Patrick I really spotted your solution
If you have a wheeze, is it worse in the morning?and realized you wanted to offer a generic/holistic solution. Fair play 🙂 . Well i am far too lazy 🙂
Any ideas how to solve this by using prxchange() would be very helpful .
Regular expressions are very resource hungry so if you can do it with simple string functions then that's often a better performing solution.
data have;
  infile datalines truncover;
  input symptoms $150.;
  cards;
1. Over the past 1 months, I have coughed
2. Over the past 1 months, I have brought up phlegm (sputum):
3. Over the past 1 months, I have had shortness of breath:
4. Over the past 1 months, I have had attacks of wheezing:
5. During the past 1 months how many severe or very unpleasant attacks of chest trouble have you had?
6 How long did the worst attack of chest trouble last? (Go to question 7 if you had no severe attacks)
7. Over the past 1 months, in an average week, how many good days (with little chest trouble) have you had?
8. If you have a wheeze, is it worse in the morning?
If you have a wheeze, is it worse in the morning?
;
data want(drop=_:);
  set have;
  symptoms1=prxchange('s/^\d+[\. ]*//oi',1,strip(symptoms));
  if anydigit(symptoms)=1 then
    do;
      call scan(symptoms, 2, _pos, _len, ' ');
      symptoms2=substrn(symptoms,_pos);
    end;
  else symptoms2=symptoms;
run;
proc print data=want;
run;Hi,
How about ANYALPHA() function?
data have;
  infile datalines truncover;
  input symptoms $150.;
  cards;
1. Over the past 1 months, I have coughed
2. Over the past 1 months, I have brought up phlegm (sputum):
3. Over the past 1 months, I have had shortness of breath:
4. Over the past 1 months, I have had attacks of wheezing:
5. During the past 1 months how many severe or very unpleasant attacks of chest trouble have you had?
6 How long did the worst attack of chest trouble last? (Go to question 7 if you had no severe attacks)
7. Over the past 1 months, in an average week, how many good days (with little chest trouble) have you had?
8. If you have a wheeze, is it worse in the morning?
If you have a wheeze, is it worse in the morning?
;
run;
data want;
  set have;
  symptoms2=substrn(symptoms, anyalpha(symptoms));
run;
All the best
Bart
@sahoositaram555 wrote:
Hi @Patrick. Thank you so much for showing me a way to deal this with redexp of perl.
I read and understood the statement prxchange('s/^\d+[\. ]*//oi',1,strip(symptoms)) except the //oi part. I could guess that its making the search strategy case insensitive but what exactly o does and further by adding // which i read in a material that it is used for replacement ?
could you please help me with your response to that 1 query of mine? Hope to hear from you soon.
From a docu entry found here.
The basic syntax for searching and replacing text has the following form:
s/regular-expression/replacement-string/ The following example uses the PRXCHANGE function to show how substitution is performed:
prxchange('s/world/planet/', 1, 'Hello world!'); Arguments
specifies the metacharacter for substitution.
specifies the regular expression.
specifies the replacement value for world.
specifies that the search ends when one match is found.
specifies the source string to be searched.
The result of the substitution is Hello planet.
"further by adding // which i read in a material that it is used for replacement"
To just remove a matching string means to replace it with nothing. That's how you end up with two consecutive slashes.
"I could guess that its making the search strategy case insensitive but what exactly o does"
- The i stands for case insensitive
- The o stands for only once
None of above two modifiers is required for the case here. I just made it a coding habit for myself to always add them unless there is a reason not to.
Searching for digits, dots and blanks only wouldn't require the i, passing the regex as string to the function wouldn't require the o
You only need the o modifier if you use a SAS variable in the function and you want to avoid that the RegEx gets compiled in every single iteration of the data step.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
