BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
allaboutsas
Calcite | Level 5

I’m trying this example below, thought what it does is asserts what immediately follows is not “ dollars”, I expected var2 for the first record “100 dollars” returns null, but instead it returns 10, what should I use if I don’t want any value returned for the first record?

data check;

var='100 dollars'; output;

var='100 pesos'; output;

var='USD100'; output;

var='JPY100'; output;

var='JPYbdl100'; output;

run;

data check2;

set check;

*** create the regular expression only once ***;

retain re1;

if _N_=1 then do;

re1 = prxparse('/\d+(?! dollars)/');

end;

if prxmatch(re1,var) then do;

var2 = prxposn(re1,0,var);

end;

run;

proc print data=check2; run;

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

You really make me headache . Hope you don't post such question again .

data check;
var="Patient ab ht reported .1 headache 1.3f and nausea. MD ods noticed rash."; output;
var="ab ht 2.2 Pt. Rptd. Backache. ht usd2.5 od"; output;
var="2.5h of ods patient reported seeing spots."; output;
var="1.3g Elevated pulse ab ht and 0.6d labored breathing."; output;
var="Headache."; output;
var="ab ht .3 5 4 .5k Headache ab";output;
run;
 
 
data want;
set check;
length v $ 20;
retain pid;
var=prxchange('s/\bab\s+ht\b|\bods\b/|/i',-1,var);
if _n_ eq 1 then pid=prxparse('/[^|]\s+(\S+)?\s+(\S+)?\s+[dh-z]+(\d+)?\.?\d+\s+|\s+(\d+)?\.?\d+[abe-z]+\s+(\S+)?\s+(\S+)?\s+[^|]/i');
call prxsubstr(pid, var, position, length);
if position ne 0 then v = prxchange('s/^\.+(?=\d+\.\d+)|\.+$//' ,-1, compress(substr(var, position, length),'.','kd') );
drop pid position length;
run;

Xia Keshan

View solution in original post

42 REPLIES 42
allaboutsas
Calcite | Level 5

I can't specify 100 because it could be any number even decimals.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Hi,

Am not sure what you are trying to do.  Is it just to get the number part of var?  If so then compress with alphas option should work fine:

http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000212246.htm

allaboutsas
Calcite | Level 5

Let me give a better example:

data check;

var="Patient reported 100 c headache 3f and nausea. MD noticed rash."; output;

var="a2 Pt. Rptd. Backache. usd25"; output;

var="b25h of patient reported seeing spots."; output;

var="3g Elevated pulse and 6d labored breathing."; output;

var="Headache."; output;

run;

Yes, trying to get number part of var, but only if it meets the following conditions:

What proceeds is not a or b, example a2 or b25 is not wanted.

What follows is not c or d, example 100 c or 6d is not wanted.

Reeza
Super User

What are you trying to do?

Haikuo
Onyx | Level 15

For you example as is, you need to specify the word boundary for '100', otherwise, '0' is not 'dollars', so '10' meets your criteria and is selected.

data check;

var='100 dollars'; output;

var='100 pesos'; output;

var='USD100'; output;

var='JPY100'; output;

var='JPYbdl100'; output;

run;

data check2;

set check;

*** create the regular expression only once ***;

retain re1;

if _N_=1 then do;

re1 = prxparse('/\d+\b(?! dollars)/');

end;

if prxmatch(re1,var) then do;

var2 = prxposn(re1,0,var);

end;

run;

allaboutsas
Calcite | Level 5

Thanks Hai.kuo. boundary seams working for the ones with spaces, will miss the ones without space. any other suggestions?

Haikuo
Onyx | Level 15

Are you sure about that?

data check;

var='100 dollars'; output;

var='100dollars'; output;

var='100 pesos'; output;

var='USD100'; output;

var='JPY100'; output;

var='JPYbdl100'; output;

run;

data check2;

set check;

*** create the regular expression only once ***;

retain re1;

if _N_=1 then do;

re1 = prxparse('/\d+\b(?! dollars)/');

end;

if prxmatch(re1,var) then do;

var2 = prxposn(re1,0,var);

end;

run;

Capture.PNG

allaboutsas
Calcite | Level 5

Ok, thanks, you are right. I expanded it a little to include look behind, now how come JPY100 and JPYbd1100 don't return anything? any idea?

data check;

var='100 dollars'; output;

var='100dollars'; output;

var='100 pesos'; output;

var='USD100'; output;

var='USD 100'; output;

var='JPY100'; output;

var='JPYbdl100'; output;

run;

run;

data check2;

set check;

*** create the regular expression only once ***;

retain re1;

if _N_=1 then do;

re1 = prxparse('/(?<!D )\b\d+\b(?! dollars)/');

end;

if prxmatch(re1,var) then do;

var2 = prxposn(re1,0,var);

end;

run;

proc print data=check2; run;

Haikuo
Onyx | Level 15

It is indeed not that easy, The core of your problem is to 'negatively' identify an specific char or string, as it may sound simple , but it actually has two layers while PRX can only handle one at a time. One layer is that you want it to be an Alphabet, another layer is that you don't want it to be certain Alphabet. The following code may not cover all of your ground, but it seems working on your presented data as is:

data check;

var='100 dollars'; output;

var='100dollars'; output;

var='100 pesos'; output;

var='USD100'; output;

var='USD 100'; output;

var='JPY100'; output;

var='JPYbdl100'; output;

run;

data check2;

set check;

if prxmatch('m/usd|dollar/io',var)=0 then

var2=prxchange('s/(^\D*|\D*$)//io',-1,var);

run;


proc print data=check2; run;

Ksharp
Super User

If I understand what you mean.

data check;

var='100 dollars'; output;

var='100dollars'; output;

var='100 pesos'; output;

var='USD100'; output;

var='USD 100'; output;

var='JPY100'; output;

var='JPYbdl100'; output;

run;

data want;

set check;

length w $ 20;

if not prxmatch('/(JPY|JPYbd)\s*\d+|\d+\s*(dollars)/i',var) then w=compress(var,,'kd');

run;

allaboutsas
Calcite | Level 5

Sorry for the late reply. Thank you Hai.kuo and xia keshan. The codes both of you provided work only for this sample, won’t work for the other one I gave, which is closer to my real data:

data check;

var="Patient reported 100 c headache 3f and nausea. MD noticed rash."; output;

var="a2 Pt. Rptd. Backache. usd25"; output;

var="b25h of patient reported seeing spots."; output;

var="3g Elevated pulse and 6d labored breathing."; output;

var="Headache."; output;

run;

As you can see var could have other numbers in it, while I only want those not started with a or b, and not followed with c or d. Is there a way to do it? 

ballardw
Super User

What do you want the output to look like for this data sample?

What would you expect for a data line like:

Pulse 110 BP 70 over 30

allaboutsas
Calcite | Level 5

I should return this:

var                                                                                                      var2

Patient reported 100 c headache 3f and nausea. MD noticed rash.

3

a2 Pt. Rptd. Backache. usd25

25

b25h of patient reported seeing spots.

3g Elevated pulse and 6d labored breathing.

3

Headache.

Thanks.

allaboutsas
Calcite | Level 5

And what's before and after the number is random, there's not way to specify them.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 42 replies
  • 2661 views
  • 7 likes
  • 8 in conversation