I’m trying this example below, thought what it does is asserts what immediately follows is not “ dollars”, I expected var2 for the first record “100 dollars” returns null, but instead it returns 10, what should I use if I don’t want any value returned for the first record?
data check;
var='100 dollars'; output;
var='100 pesos'; output;
var='USD100'; output;
var='JPY100'; output;
var='JPYbdl100'; output;
run;
data check2;
set check;
*** create the regular expression only once ***;
retain re1;
if _N_=1 then do;
re1 = prxparse('/\d+(?! dollars)/');
end;
if prxmatch(re1,var) then do;
var2 = prxposn(re1,0,var);
end;
run;
proc print data=check2; run;
You really make me headache . Hope you don't post such question again .
data check; var="Patient ab ht reported .1 headache 1.3f and nausea. MD ods noticed rash."; output; var="ab ht 2.2 Pt. Rptd. Backache. ht usd2.5 od"; output; var="2.5h of ods patient reported seeing spots."; output; var="1.3g Elevated pulse ab ht and 0.6d labored breathing."; output; var="Headache."; output; var="ab ht .3 5 4 .5k Headache ab";output; run; data want; set check; length v $ 20; retain pid; var=prxchange('s/\bab\s+ht\b|\bods\b/|/i',-1,var); if _n_ eq 1 then pid=prxparse('/[^|]\s+(\S+)?\s+(\S+)?\s+[dh-z]+(\d+)?\.?\d+\s+|\s+(\d+)?\.?\d+[abe-z]+\s+(\S+)?\s+(\S+)?\s+[^|]/i'); call prxsubstr(pid, var, position, length); if position ne 0 then v = prxchange('s/^\.+(?=\d+\.\d+)|\.+$//' ,-1, compress(substr(var, position, length),'.','kd') ); drop pid position length; run;
Xia Keshan
I can't specify 100 because it could be any number even decimals.
Hi,
Am not sure what you are trying to do. Is it just to get the number part of var? If so then compress with alphas option should work fine:
http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000212246.htm
Let me give a better example:
data check;
var="Patient reported 100 c headache 3f and nausea. MD noticed rash."; output;
var="a2 Pt. Rptd. Backache. usd25"; output;
var="b25h of patient reported seeing spots."; output;
var="3g Elevated pulse and 6d labored breathing."; output;
var="Headache."; output;
run;
Yes, trying to get number part of var, but only if it meets the following conditions:
What proceeds is not a or b, example a2 or b25 is not wanted.
What follows is not c or d, example 100 c or 6d is not wanted.
What are you trying to do?
For you example as is, you need to specify the word boundary for '100', otherwise, '0' is not 'dollars', so '10' meets your criteria and is selected.
data check;
var='100 dollars'; output;
var='100 pesos'; output;
var='USD100'; output;
var='JPY100'; output;
var='JPYbdl100'; output;
run;
data check2;
set check;
*** create the regular expression only once ***;
retain re1;
if _N_=1 then do;
re1 = prxparse('/\d+\b(?! dollars)/');
end;
if prxmatch(re1,var) then do;
var2 = prxposn(re1,0,var);
end;
run;
Thanks Hai.kuo. boundary seams working for the ones with spaces, will miss the ones without space. any other suggestions?
Are you sure about that?
data check;
var='100 dollars'; output;
var='100dollars'; output;
var='100 pesos'; output;
var='USD100'; output;
var='JPY100'; output;
var='JPYbdl100'; output;
run;
data check2;
set check;
*** create the regular expression only once ***;
retain re1;
if _N_=1 then do;
re1 = prxparse('/\d+\b(?! dollars)/');
end;
if prxmatch(re1,var) then do;
var2 = prxposn(re1,0,var);
end;
run;
Ok, thanks, you are right. I expanded it a little to include look behind, now how come JPY100 and JPYbd1100 don't return anything? any idea?
data check;
var='100 dollars'; output;
var='100dollars'; output;
var='100 pesos'; output;
var='USD100'; output;
var='USD 100'; output;
var='JPY100'; output;
var='JPYbdl100'; output;
run;
run;
data check2;
set check;
*** create the regular expression only once ***;
retain re1;
if _N_=1 then do;
re1 = prxparse('/(?<!D )\b\d+\b(?! dollars)/');
end;
if prxmatch(re1,var) then do;
var2 = prxposn(re1,0,var);
end;
run;
proc print data=check2; run;
It is indeed not that easy, The core of your problem is to 'negatively' identify an specific char or string, as it may sound simple , but it actually has two layers while PRX can only handle one at a time. One layer is that you want it to be an Alphabet, another layer is that you don't want it to be certain Alphabet. The following code may not cover all of your ground, but it seems working on your presented data as is:
data check;
var='100 dollars'; output;
var='100dollars'; output;
var='100 pesos'; output;
var='USD100'; output;
var='USD 100'; output;
var='JPY100'; output;
var='JPYbdl100'; output;
run;
data check2;
set check;
if prxmatch('m/usd|dollar/io',var)=0 then
var2=prxchange('s/(^\D*|\D*$)//io',-1,var);
run;
proc print data=check2; run;
If I understand what you mean.
data check;
var='100 dollars'; output;
var='100dollars'; output;
var='100 pesos'; output;
var='USD100'; output;
var='USD 100'; output;
var='JPY100'; output;
var='JPYbdl100'; output;
run;
data want;
set check;
length w $ 20;
if not prxmatch('/(JPY|JPYbd)\s*\d+|\d+\s*(dollars)/i',var) then w=compress(var,,'kd');
run;
Sorry for the late reply. Thank you Hai.kuo and xia keshan. The codes both of you provided work only for this sample, won’t work for the other one I gave, which is closer to my real data:
data check;
var="Patient reported 100 c headache 3f and nausea. MD noticed rash."; output;
var="a2 Pt. Rptd. Backache. usd25"; output;
var="b25h of patient reported seeing spots."; output;
var="3g Elevated pulse and 6d labored breathing."; output;
var="Headache."; output;
run;
As you can see var could have other numbers in it, while I only want those not started with a or b, and not followed with c or d. Is there a way to do it?
What do you want the output to look like for this data sample?
What would you expect for a data line like:
Pulse 110 BP 70 over 30
I should return this:
var var2
Patient reported 100 c headache 3f and nausea. MD noticed rash. | 3 |
a2 Pt. Rptd. Backache. usd25 | 25 |
b25h of patient reported seeing spots. | |
3g Elevated pulse and 6d labored breathing. | 3 |
Headache. |
Thanks.
And what's before and after the number is random, there's not way to specify them.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.