BookmarkSubscribeRSS Feed
sahoositaram555
Pyrite | Level 9

I have 

data var;

input symptoms;

cards;

1. Over the past 1 months, I have coughed
2. Over the past 1 months, I have brought up phlegm (sputum):
3. Over the past 1 months, I have had shortness of breath:
4. Over the past 1 months, I have had attacks of wheezing:
5. During the past 1 months how many severe or very unpleasant attacks of chest trouble have you had?
6 How long did the worst attack of chest trouble last? (Go to question 7 if you had no severe attacks)
7. Over the past 1 months, in an average week, how many good days (with little chest trouble) have you had?
8. If you have a wheeze, is it worse in the morning?

;run;

 

i would like to extract the character part of the string by supressing 1. 2. 3. ......8. from the beginning of the string. it has to be removed in such a way that the 2nd or 3rd numeric value occurrences shouldn't be affected.  

 

Any ideas how to solve this by using prxchange() would be very helpful .

12 REPLIES 12
novinosrin
Tourmaline | Level 20

You could take advantage of the first space and using substr from that position

 

data var;
infile cards truncover;
input symptoms $500.;

cards;
1. Over the past 1 months, I have coughed
2. Over the past 1 months, I have brought up phlegm (sputum):
3. Over the past 1 months, I have had shortness of breath:
4. Over the past 1 months, I have had attacks of wheezing:
5. During the past 1 months how many severe or very unpleasant attacks of chest trouble have you had?
6 How long did the worst attack of chest trouble last? (Go to question 7 if you had no severe attacks)
7. Over the past 1 months, in an average week, how many good days (with little chest trouble) have you had?
8. If you have a wheeze, is it worse in the morning?
;
run;

data want;
 set var;
 length want $500;
 want=substr(strip(symptoms),anyspace(symptoms)+1);
run;
sahoositaram555
Pyrite | Level 9
Hi @novoinosrin, Thank you for your rapid response . I can see that it's working.
I would like to understand it . So requesting you if please can explain to the below question it will be helpful.
in the statement anyspace(symptoms)+1 results in 4 and strip(symptoms) removes all the spaces so it considers it as a whole string. from a wholestring till 4th position if it reads then how +1 helps in this situtation to get the exact outpt, could you please explain.
novinosrin
Tourmaline | Level 20

Hi @sahoositaram555  I like the fact you are trying to understand the code. Very good attitude. 

Honestly I didn't give much thought as I wrote that as for me SAS is merely a game, sometimes I hit and miss. 

 

Well, your understanding is correct. Basically

1. anyspace(string) determines the position of the first space after the number and dot. The strip makes sure and leading spaces are removed before anyspace function executes. 

2. Once you have determined the position of the first blank space embedded in the string, the characters that follow the position of the first space ought to be the ones we need to extract

3. +1 is to move the pointer to the next immediate position after the first blank space in order to avoid extracting the blank space causing leading blank spaces, albeit should you still have blank spaces the str after starting point, those would fall in the result of the extracted string

4. I went with some assumptions like your string starts with numbers followed by dot and then a space and then follows the needed characters to extract

5. If those assumptions hold true, the solution is rather simple and easy 

Patrick
Opal | Level 21

@novinosrin 

Just to "annoy" you.

data var;
infile cards truncover;
input symptoms $60.;

cards;
1. Over the past 1 months, I have coughed
Over the past 1 months, I have coughed
;

data want;
 set var;
 length want $500;
 want=substr(strip(symptoms),anyspace(symptoms)+1);
run;

proc print;
run;

Patrick_0-1585144117865.png

 

novinosrin
Tourmaline | Level 20

Sir @Patrick  Yes I noticed that in your solution example as you modified the sample in the last dataline. However, that's the assumption I went with. 😊

novinosrin
Tourmaline | Level 20

@Patrick  I really spotted your solution 

If you have a wheeze, is it worse in the morning?

and realized you wanted to offer a generic/holistic solution. Fair play 🙂 . Well i am far too lazy 🙂

Patrick
Opal | Level 21

Any ideas how to solve this by using prxchange() would be very helpful .


Regular expressions are very resource hungry so if you can do it with simple string functions then that's often a better performing solution.

data have;
  infile datalines truncover;
  input symptoms $150.;
  cards;
1. Over the past 1 months, I have coughed
2. Over the past 1 months, I have brought up phlegm (sputum):
3. Over the past 1 months, I have had shortness of breath:
4. Over the past 1 months, I have had attacks of wheezing:
5. During the past 1 months how many severe or very unpleasant attacks of chest trouble have you had?
6 How long did the worst attack of chest trouble last? (Go to question 7 if you had no severe attacks)
7. Over the past 1 months, in an average week, how many good days (with little chest trouble) have you had?
8. If you have a wheeze, is it worse in the morning?
If you have a wheeze, is it worse in the morning?
;

data want(drop=_:);
  set have;
  symptoms1=prxchange('s/^\d+[\. ]*//oi',1,strip(symptoms));
  if anydigit(symptoms)=1 then
    do;
      call scan(symptoms, 2, _pos, _len, ' ');
      symptoms2=substrn(symptoms,_pos);
    end;
  else symptoms2=symptoms;
run;

proc print data=want;
run;
yabwon
Onyx | Level 15

Hi,

 

How about ANYALPHA() function?

 

data have;
  infile datalines truncover;
  input symptoms $150.;
  cards;
1. Over the past 1 months, I have coughed
2. Over the past 1 months, I have brought up phlegm (sputum):
3. Over the past 1 months, I have had shortness of breath:
4. Over the past 1 months, I have had attacks of wheezing:
5. During the past 1 months how many severe or very unpleasant attacks of chest trouble have you had?
6 How long did the worst attack of chest trouble last? (Go to question 7 if you had no severe attacks)
7. Over the past 1 months, in an average week, how many good days (with little chest trouble) have you had?
8. If you have a wheeze, is it worse in the morning?
If you have a wheeze, is it worse in the morning?
;
run;

data want;
  set have;
  symptoms2=substrn(symptoms, anyalpha(symptoms));
run;

 

All the best

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



sahoositaram555
Pyrite | Level 9
Thanks @yabwon for the update. Anyalpha is amazing .
sahoositaram555
Pyrite | Level 9
Hi @Patrick. Thank you so much for showing me a way to deal this with redexp of perl.
I read and understood the statement prxchange('s/^\d+[\. ]*//oi',1,strip(symptoms)) except the //oi part. I could guess that its making the search strategy case insensitive but what exactly o does and further by adding // which i read in a material that it is used for replacement ?

could you please help me with your response to that 1 query of mine? Hope to hear from you soon.
Patrick
Opal | Level 21

@sahoositaram555 wrote:
Hi @Patrick. Thank you so much for showing me a way to deal this with redexp of perl.
I read and understood the statement prxchange('s/^\d+[\. ]*//oi',1,strip(symptoms)) except the //oi part. I could guess that its making the search strategy case insensitive but what exactly o does and further by adding // which i read in a material that it is used for replacement ?

could you please help me with your response to that 1 query of mine? Hope to hear from you soon.

From a docu entry found here.

Basic Syntax for Searching and Replacing Text

The basic syntax for searching and replacing text has the following form:

s/regular-expression/replacement-string/ 

The following example uses the PRXCHANGE function to show how substitution is performed:

prxchange('s/world/planet/', 1, 'Hello world!'); 

Arguments

s

specifies the metacharacter for substitution.

world

specifies the regular expression.

planet

specifies the replacement value for world.

1

specifies that the search ends when one match is found.

Hello world!

specifies the source string to be searched.

The result of the substitution is Hello planet.

 

"further by adding // which i read in a material that it is used for replacement"

To just remove a matching string means to replace it with nothing. That's how you end up with two consecutive slashes.

 

"I could guess that its making the search strategy case insensitive but what exactly o does"

- The i stands for case insensitive

- The o stands for only once

None of above two modifiers is required for the case here. I just made it a coding habit for myself to always add them unless there is a reason not to.

Searching for digits, dots and blanks only wouldn't require the i, passing the regex as string to the function wouldn't require the o

You only need the o modifier if you use a SAS variable in the function and you want to avoid that the RegEx gets compiled in every single iteration of the data step.

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 12 replies
  • 1233 views
  • 1 like
  • 4 in conversation