Re: substring from a string (only first occurrence numeric values has ...

sahoositaram555 · Posted 03-25-2020 08:57 AM

I have

data var;

input symptoms;

cards;

1. Over the past 1 months, I have coughed
2. Over the past 1 months, I have brought up phlegm (sputum):
3. Over the past 1 months, I have had shortness of breath:
4. Over the past 1 months, I have had attacks of wheezing:
5. During the past 1 months how many severe or very unpleasant attacks of chest trouble have you had?
6 How long did the worst attack of chest trouble last? (Go to question 7 if you had no severe attacks)
7. Over the past 1 months, in an average week, how many good days (with little chest trouble) have you had?
8. If you have a wheeze, is it worse in the morning?

;run;

i would like to extract the character part of the string by supressing 1. 2. 3. ......8. from the beginning of the string. it has to be removed in such a way that the 2nd or 3rd numeric value occurrences shouldn't be affected.

Any ideas how to solve this by using prxchange() would be very helpful .

novinosrin · Posted 03-25-2020 09:04 AM

You could take advantage of the first space and using substr from that position

data var;
infile cards truncover;
input symptoms $500.;

cards;
1. Over the past 1 months, I have coughed
2. Over the past 1 months, I have brought up phlegm (sputum):
3. Over the past 1 months, I have had shortness of breath:
4. Over the past 1 months, I have had attacks of wheezing:
5. During the past 1 months how many severe or very unpleasant attacks of chest trouble have you had?
6 How long did the worst attack of chest trouble last? (Go to question 7 if you had no severe attacks)
7. Over the past 1 months, in an average week, how many good days (with little chest trouble) have you had?
8. If you have a wheeze, is it worse in the morning?
;
run;

data want;
 set var;
 length want $500;
 want=substr(strip(symptoms),anyspace(symptoms)+1);
run;

sahoositaram555 · Posted 03-25-2020 09:35 AM

Hi @novoinosrin, Thank you for your rapid response . I can see that it's working.
I would like to understand it . So requesting you if please can explain to the below question it will be helpful.
in the statement anyspace(symptoms)+1 results in 4 and strip(symptoms) removes all the spaces so it considers it as a whole string. from a wholestring till 4th position if it reads then how +1 helps in this situtation to get the exact outpt, could you please explain.

novinosrin · Posted 03-25-2020 09:50 AM

Hi @sahoositaram555 I like the fact you are trying to understand the code. Very good attitude.

Honestly I didn't give much thought as I wrote that as for me SAS is merely a game, sometimes I hit and miss.

Well, your understanding is correct. Basically

1. anyspace(string) determines the position of the first space after the number and dot. The strip makes sure and leading spaces are removed before anyspace function executes.

2. Once you have determined the position of the first blank space embedded in the string, the characters that follow the position of the first space ought to be the ones we need to extract

3. +1 is to move the pointer to the next immediate position after the first blank space in order to avoid extracting the blank space causing leading blank spaces, albeit should you still have blank spaces the str after starting point, those would fall in the result of the extracted string

4. I went with some assumptions like your string starts with numbers followed by dot and then a space and then follows the needed characters to extract

5. If those assumptions hold true, the solution is rather simple and easy

sahoositaram555 · Posted 03-25-2020 01:52 PM

@novinosrin, Thank you very much.

Patrick · Posted 03-25-2020 09:48 AM

@novinosrin

Just to "annoy" you.

data var;
infile cards truncover;
input symptoms $60.;

cards;
1. Over the past 1 months, I have coughed
Over the past 1 months, I have coughed
;

data want;
 set var;
 length want $500;
 want=substr(strip(symptoms),anyspace(symptoms)+1);
run;

proc print;
run;

novinosrin · Posted 03-25-2020 09:52 AM

Sir @Patrick Yes I noticed that in your solution example as you modified the sample in the last dataline. However, that's the assumption I went with. 😊

novinosrin · Posted 03-25-2020 09:54 AM

@Patrick I really spotted your solution

If you have a wheeze, is it worse in the morning?

and realized you wanted to offer a generic/holistic solution. Fair play 🙂 . Well i am far too lazy 🙂

Patrick · Posted 03-25-2020 09:10 AM

Any ideas how to solve this by using prxchange() would be very helpful .

Regular expressions are very resource hungry so if you can do it with simple string functions then that's often a better performing solution.

data have;
  infile datalines truncover;
  input symptoms $150.;
  cards;
1. Over the past 1 months, I have coughed
2. Over the past 1 months, I have brought up phlegm (sputum):
3. Over the past 1 months, I have had shortness of breath:
4. Over the past 1 months, I have had attacks of wheezing:
5. During the past 1 months how many severe or very unpleasant attacks of chest trouble have you had?
6 How long did the worst attack of chest trouble last? (Go to question 7 if you had no severe attacks)
7. Over the past 1 months, in an average week, how many good days (with little chest trouble) have you had?
8. If you have a wheeze, is it worse in the morning?
If you have a wheeze, is it worse in the morning?
;

data want(drop=_:);
  set have;
  symptoms1=prxchange('s/^\d+[\. ]*//oi',1,strip(symptoms));
  if anydigit(symptoms)=1 then
    do;
      call scan(symptoms, 2, _pos, _len, ' ');
      symptoms2=substrn(symptoms,_pos);
    end;
  else symptoms2=symptoms;
run;

proc print data=want;
run;

yabwon · Posted 03-25-2020 11:24 AM

Hi,

How about ANYALPHA() function?

data have;
  infile datalines truncover;
  input symptoms $150.;
  cards;
1. Over the past 1 months, I have coughed
2. Over the past 1 months, I have brought up phlegm (sputum):
3. Over the past 1 months, I have had shortness of breath:
4. Over the past 1 months, I have had attacks of wheezing:
5. During the past 1 months how many severe or very unpleasant attacks of chest trouble have you had?
6 How long did the worst attack of chest trouble last? (Go to question 7 if you had no severe attacks)
7. Over the past 1 months, in an average week, how many good days (with little chest trouble) have you had?
8. If you have a wheeze, is it worse in the morning?
If you have a wheeze, is it worse in the morning?
;
run;

data want;
  set have;
  symptoms2=substrn(symptoms, anyalpha(symptoms));
run;

All the best

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation

sahoositaram555 · Posted 03-25-2020 04:53 PM

Thanks @yabwon for the update. Anyalpha is amazing .

sahoositaram555 · Posted 03-25-2020 04:49 PM

Hi @Patrick. Thank you so much for showing me a way to deal this with redexp of perl.
I read and understood the statement prxchange('s/^\d+[\. ]*//oi',1,strip(symptoms)) except the //oi part. I could guess that its making the search strategy case insensitive but what exactly o does and further by adding // which i read in a material that it is used for replacement ?

could you please help me with your response to that 1 query of mine? Hope to hear from you soon.

Patrick · Posted 03-25-2020 07:10 PM

@sahoositaram555 wrote:
Hi @Patrick. Thank you so much for showing me a way to deal this with redexp of perl.
I read and understood the statement prxchange('s/^\d+[\. ]*//oi',1,strip(symptoms)) except the //oi part. I could guess that its making the search strategy case insensitive but what exactly o does and further by adding // which i read in a material that it is used for replacement ?

could you please help me with your response to that 1 query of mine? Hope to hear from you soon.

From a docu entry found here.

Basic Syntax for Searching and Replacing Text

The basic syntax for searching and replacing text has the following form:

s/regular-expression/replacement-string/

The following example uses the PRXCHANGE function to show how substitution is performed:

prxchange('s/world/planet/', 1, 'Hello world!');

Arguments

s: specifies the metacharacter for substitution.
world: specifies the regular expression.
planet: specifies the replacement value for world.
1: specifies that the search ends when one match is found.
Hello world!: specifies the source string to be searched.

The result of the substitution is Hello planet.

"further by adding // which i read in a material that it is used for replacement"

To just remove a matching string means to replace it with nothing. That's how you end up with two consecutive slashes.

"I could guess that its making the search strategy case insensitive but what exactly o does"

- The i stands for case insensitive

- The o stands for only once

None of above two modifiers is required for the case here. I just made it a coding habit for myself to always add them unless there is a reason not to.

Searching for digits, dots and blanks only wouldn't require the i, passing the regex as string to the function wouldn't require the o

You only need the o modifier if you use a SAS variable in the function and you want to avoid that the RegEx gets compiled in every single iteration of the data step.

substring from a string (only first occurrence numeric values has to be removed)

Re: substring from a string (only first occurrence numeric values has to be removed)

Re: substring from a string (only first occurrence numeric values has to be removed)

Re: substring from a string (only first occurrence numeric values has to be removed)

Re: substring from a string (only first occurrence numeric values has to be removed)

Re: substring from a string (only first occurrence numeric values has to be removed)

Re: substring from a string (only first occurrence numeric values has to be removed)

Re: substring from a string (only first occurrence numeric values has to be removed)

Re: substring from a string (only first occurrence numeric values has to be removed)

Re: substring from a string (only first occurrence numeric values has to be removed)

Re: substring from a string (only first occurrence numeric values has to be removed)

Re: substring from a string (only first occurrence numeric values has to be removed)

Re: substring from a string (only first occurrence numeric values has to be removed)

Basic Syntax for Searching and Replacing Text

Basic Syntax for Searching and Replacing Text

Registration is open

SAS Training: Just a Click Away