- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I have
data var;
input symptoms;
cards;
1. Over the past 1 months, I have coughed
2. Over the past 1 months, I have brought up phlegm (sputum):
3. Over the past 1 months, I have had shortness of breath:
4. Over the past 1 months, I have had attacks of wheezing:
5. During the past 1 months how many severe or very unpleasant attacks of chest trouble have you had?
6 How long did the worst attack of chest trouble last? (Go to question 7 if you had no severe attacks)
7. Over the past 1 months, in an average week, how many good days (with little chest trouble) have you had?
8. If you have a wheeze, is it worse in the morning?
;run;
i would like to extract the character part of the string by supressing 1. 2. 3. ......8. from the beginning of the string. it has to be removed in such a way that the 2nd or 3rd numeric value occurrences shouldn't be affected.
Any ideas how to solve this by using prxchange() would be very helpful .
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You could take advantage of the first space and using substr from that position
data var;
infile cards truncover;
input symptoms $500.;
cards;
1. Over the past 1 months, I have coughed
2. Over the past 1 months, I have brought up phlegm (sputum):
3. Over the past 1 months, I have had shortness of breath:
4. Over the past 1 months, I have had attacks of wheezing:
5. During the past 1 months how many severe or very unpleasant attacks of chest trouble have you had?
6 How long did the worst attack of chest trouble last? (Go to question 7 if you had no severe attacks)
7. Over the past 1 months, in an average week, how many good days (with little chest trouble) have you had?
8. If you have a wheeze, is it worse in the morning?
;
run;
data want;
set var;
length want $500;
want=substr(strip(symptoms),anyspace(symptoms)+1);
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I would like to understand it . So requesting you if please can explain to the below question it will be helpful.
in the statement anyspace(symptoms)+1 results in 4 and strip(symptoms) removes all the spaces so it considers it as a whole string. from a wholestring till 4th position if it reads then how +1 helps in this situtation to get the exact outpt, could you please explain.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi @sahoositaram555 I like the fact you are trying to understand the code. Very good attitude.
Honestly I didn't give much thought as I wrote that as for me SAS is merely a game, sometimes I hit and miss.
Well, your understanding is correct. Basically
1. anyspace(string) determines the position of the first space after the number and dot. The strip makes sure and leading spaces are removed before anyspace function executes.
2. Once you have determined the position of the first blank space embedded in the string, the characters that follow the position of the first space ought to be the ones we need to extract
3. +1 is to move the pointer to the next immediate position after the first blank space in order to avoid extracting the blank space causing leading blank spaces, albeit should you still have blank spaces the str after starting point, those would fall in the result of the extracted string
4. I went with some assumptions like your string starts with numbers followed by dot and then a space and then follows the needed characters to extract
5. If those assumptions hold true, the solution is rather simple and easy
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Just to "annoy" you.
data var;
infile cards truncover;
input symptoms $60.;
cards;
1. Over the past 1 months, I have coughed
Over the past 1 months, I have coughed
;
data want;
set var;
length want $500;
want=substr(strip(symptoms),anyspace(symptoms)+1);
run;
proc print;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Sir @Patrick Yes I noticed that in your solution example as you modified the sample in the last dataline. However, that's the assumption I went with. 😊
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Patrick I really spotted your solution
If you have a wheeze, is it worse in the morning?
and realized you wanted to offer a generic/holistic solution. Fair play 🙂 . Well i am far too lazy 🙂
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Any ideas how to solve this by using prxchange() would be very helpful .
Regular expressions are very resource hungry so if you can do it with simple string functions then that's often a better performing solution.
data have;
infile datalines truncover;
input symptoms $150.;
cards;
1. Over the past 1 months, I have coughed
2. Over the past 1 months, I have brought up phlegm (sputum):
3. Over the past 1 months, I have had shortness of breath:
4. Over the past 1 months, I have had attacks of wheezing:
5. During the past 1 months how many severe or very unpleasant attacks of chest trouble have you had?
6 How long did the worst attack of chest trouble last? (Go to question 7 if you had no severe attacks)
7. Over the past 1 months, in an average week, how many good days (with little chest trouble) have you had?
8. If you have a wheeze, is it worse in the morning?
If you have a wheeze, is it worse in the morning?
;
data want(drop=_:);
set have;
symptoms1=prxchange('s/^\d+[\. ]*//oi',1,strip(symptoms));
if anydigit(symptoms)=1 then
do;
call scan(symptoms, 2, _pos, _len, ' ');
symptoms2=substrn(symptoms,_pos);
end;
else symptoms2=symptoms;
run;
proc print data=want;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
How about ANYALPHA() function?
data have;
infile datalines truncover;
input symptoms $150.;
cards;
1. Over the past 1 months, I have coughed
2. Over the past 1 months, I have brought up phlegm (sputum):
3. Over the past 1 months, I have had shortness of breath:
4. Over the past 1 months, I have had attacks of wheezing:
5. During the past 1 months how many severe or very unpleasant attacks of chest trouble have you had?
6 How long did the worst attack of chest trouble last? (Go to question 7 if you had no severe attacks)
7. Over the past 1 months, in an average week, how many good days (with little chest trouble) have you had?
8. If you have a wheeze, is it worse in the morning?
If you have a wheeze, is it worse in the morning?
;
run;
data want;
set have;
symptoms2=substrn(symptoms, anyalpha(symptoms));
run;
All the best
Bart
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug
"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings
SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I read and understood the statement prxchange('s/^\d+[\. ]*//oi',1,strip(symptoms)) except the //oi part. I could guess that its making the search strategy case insensitive but what exactly o does and further by adding // which i read in a material that it is used for replacement ?
could you please help me with your response to that 1 query of mine? Hope to hear from you soon.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@sahoositaram555 wrote:
Hi @Patrick. Thank you so much for showing me a way to deal this with redexp of perl.
I read and understood the statement prxchange('s/^\d+[\. ]*//oi',1,strip(symptoms)) except the //oi part. I could guess that its making the search strategy case insensitive but what exactly o does and further by adding // which i read in a material that it is used for replacement ?
could you please help me with your response to that 1 query of mine? Hope to hear from you soon.
From a docu entry found here.
Basic Syntax for Searching and Replacing Text
The basic syntax for searching and replacing text has the following form:
s/regular-expression/replacement-string/
The following example uses the PRXCHANGE function to show how substitution is performed:
prxchange('s/world/planet/', 1, 'Hello world!');
Arguments
- s
-
specifies the metacharacter for substitution.
- world
-
specifies the regular expression.
- planet
-
specifies the replacement value for world.
- 1
-
specifies that the search ends when one match is found.
- Hello world!
-
specifies the source string to be searched.
The result of the substitution is Hello planet
.
"further by adding // which i read in a material that it is used for replacement"
To just remove a matching string means to replace it with nothing. That's how you end up with two consecutive slashes.
"I could guess that its making the search strategy case insensitive but what exactly o does"
- The i stands for case insensitive
- The o stands for only once
None of above two modifiers is required for the case here. I just made it a coding habit for myself to always add them unless there is a reason not to.
Searching for digits, dots and blanks only wouldn't require the i, passing the regex as string to the function wouldn't require the o
You only need the o modifier if you use a SAS variable in the function and you want to avoid that the RegEx gets compiled in every single iteration of the data step.