DATA Step, Macro, Functions and more

Negative Lookahead?

Accepted Solution Solved
Reply
Contributor
Posts: 40
Accepted Solution

Negative Lookahead?

I’m trying this example below, thought what it does is asserts what immediately follows is not “ dollars”, I expected var2 for the first record “100 dollars” returns null, but instead it returns 10, what should I use if I don’t want any value returned for the first record?

data check;

var='100 dollars'; output;

var='100 pesos'; output;

var='USD100'; output;

var='JPY100'; output;

var='JPYbdl100'; output;

run;

data check2;

set check;

*** create the regular expression only once ***;

retain re1;

if _N_=1 then do;

re1 = prxparse('/\d+(?! dollars)/');

end;

if prxmatch(re1,var) then do;

var2 = prxposn(re1,0,var);

end;

run;

proc print data=check2; run;


Accepted Solutions
Solution
‎03-27-2015 10:05 AM
Super User
Posts: 9,682

Re: Negative Lookahead?

You really make me headache . Hope you don't post such question again .

data check;
var="Patient ab ht reported .1 headache 1.3f and nausea. MD ods noticed rash."; output;
var="ab ht 2.2 Pt. Rptd. Backache. ht usd2.5 od"; output;
var="2.5h of ods patient reported seeing spots."; output;
var="1.3g Elevated pulse ab ht and 0.6d labored breathing."; output;
var="Headache."; output;
var="ab ht .3 5 4 .5k Headache ab";output;
run;
 
 
data want;
set check;
length v $ 20;
retain pid;
var=prxchange('s/\bab\s+ht\b|\bods\b/|/i',-1,var);
if _n_ eq 1 then pid=prxparse('/[^|]\s+(\S+)?\s+(\S+)?\s+[dh-z]+(\d+)?\.?\d+\s+|\s+(\d+)?\.?\d+[abe-z]+\s+(\S+)?\s+(\S+)?\s+[^|]/i');
call prxsubstr(pid, var, position, length);
if position ne 0 then v = prxchange('s/^\.+(?=\d+\.\d+)|\.+$//' ,-1, compress(substr(var, position, length),'.','kd') );
drop pid position length;
run;

Xia Keshan

View solution in original post


All Replies
Contributor
Posts: 40

Re: Negative Lookahead?

I can't specify 100 because it could be any number even decimals.

Super User
Super User
Posts: 7,408

Re: Negative Lookahead?

Hi,

Am not sure what you are trying to do.  Is it just to get the number part of var?  If so then compress with alphas option should work fine:

http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000212246.htm

Contributor
Posts: 40

Re: Negative Lookahead?

Let me give a better example:

data check;

var="Patient reported 100 c headache 3f and nausea. MD noticed rash."; output;

var="a2 Pt. Rptd. Backache. usd25"; output;

var="b25h of patient reported seeing spots."; output;

var="3g Elevated pulse and 6d labored breathing."; output;

var="Headache."; output;

run;

Yes, trying to get number part of var, but only if it meets the following conditions:

What proceeds is not a or b, example a2 or b25 is not wanted.

What follows is not c or d, example 100 c or 6d is not wanted.

Super User
Posts: 17,868

Re: Negative Lookahead?

What are you trying to do?

Respected Advisor
Posts: 3,124

Re: Negative Lookahead?

For you example as is, you need to specify the word boundary for '100', otherwise, '0' is not 'dollars', so '10' meets your criteria and is selected.

data check;

var='100 dollars'; output;

var='100 pesos'; output;

var='USD100'; output;

var='JPY100'; output;

var='JPYbdl100'; output;

run;

data check2;

set check;

*** create the regular expression only once ***;

retain re1;

if _N_=1 then do;

re1 = prxparse('/\d+\b(?! dollars)/');

end;

if prxmatch(re1,var) then do;

var2 = prxposn(re1,0,var);

end;

run;

Contributor
Posts: 40

Re: Negative Lookahead?

Thanks Hai.kuo. boundary seams working for the ones with spaces, will miss the ones without space. any other suggestions?

Respected Advisor
Posts: 3,124

Re: Negative Lookahead?

Are you sure about that?

data check;

var='100 dollars'; output;

var='100dollars'; output;

var='100 pesos'; output;

var='USD100'; output;

var='JPY100'; output;

var='JPYbdl100'; output;

run;

data check2;

set check;

*** create the regular expression only once ***;

retain re1;

if _N_=1 then do;

re1 = prxparse('/\d+\b(?! dollars)/');

end;

if prxmatch(re1,var) then do;

var2 = prxposn(re1,0,var);

end;

run;

Capture.PNG

Contributor
Posts: 40

Re: Negative Lookahead?

Ok, thanks, you are right. I expanded it a little to include look behind, now how come JPY100 and JPYbd1100 don't return anything? any idea?

data check;

var='100 dollars'; output;

var='100dollars'; output;

var='100 pesos'; output;

var='USD100'; output;

var='USD 100'; output;

var='JPY100'; output;

var='JPYbdl100'; output;

run;

run;

data check2;

set check;

*** create the regular expression only once ***;

retain re1;

if _N_=1 then do;

re1 = prxparse('/(?<!D )\b\d+\b(?! dollars)/');

end;

if prxmatch(re1,var) then do;

var2 = prxposn(re1,0,var);

end;

run;

proc print data=check2; run;

Respected Advisor
Posts: 3,124

Re: Negative Lookahead?

It is indeed not that easy, The core of your problem is to 'negatively' identify an specific char or string, as it may sound simple , but it actually has two layers while PRX can only handle one at a time. One layer is that you want it to be an Alphabet, another layer is that you don't want it to be certain Alphabet. The following code may not cover all of your ground, but it seems working on your presented data as is:

data check;

var='100 dollars'; output;

var='100dollars'; output;

var='100 pesos'; output;

var='USD100'; output;

var='USD 100'; output;

var='JPY100'; output;

var='JPYbdl100'; output;

run;

data check2;

set check;

if prxmatch('m/usd|dollar/io',var)=0 then

var2=prxchange('s/(^\D*|\D*$)//io',-1,var);

run;


proc print data=check2; run;

Super User
Posts: 9,682

Re: Negative Lookahead?

If I understand what you mean.

data check;

var='100 dollars'; output;

var='100dollars'; output;

var='100 pesos'; output;

var='USD100'; output;

var='USD 100'; output;

var='JPY100'; output;

var='JPYbdl100'; output;

run;

data want;

set check;

length w $ 20;

if not prxmatch('/(JPY|JPYbd)\s*\d+|\d+\s*(dollars)/i',var) then w=compress(var,,'kd');

run;

Contributor
Posts: 40

Re: Negative Lookahead?

Sorry for the late reply. Thank you Hai.kuo and xia keshan. The codes both of you provided work only for this sample, won’t work for the other one I gave, which is closer to my real data:

data check;

var="Patient reported 100 c headache 3f and nausea. MD noticed rash."; output;

var="a2 Pt. Rptd. Backache. usd25"; output;

var="b25h of patient reported seeing spots."; output;

var="3g Elevated pulse and 6d labored breathing."; output;

var="Headache."; output;

run;

As you can see var could have other numbers in it, while I only want those not started with a or b, and not followed with c or d. Is there a way to do it? 

Super User
Posts: 10,516

Re: Negative Lookahead?

What do you want the output to look like for this data sample?

What would you expect for a data line like:

Pulse 110 BP 70 over 30

Contributor
Posts: 40

Re: Negative Lookahead?

I should return this:

var                                                                                                      var2

Patient reported 100 c headache 3f and nausea. MD noticed rash.

3

a2 Pt. Rptd. Backache. usd25

25

b25h of patient reported seeing spots.

3g Elevated pulse and 6d labored breathing.

3

Headache.

Thanks.

Contributor
Posts: 40

Re: Negative Lookahead?

And what's before and after the number is random, there's not way to specify them.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 42 replies
  • 874 views
  • 7 likes
  • 8 in conversation