## DATA Step, Macro, Functions and more

Solved
Contributor
Posts: 40

I’m trying this example below, thought what it does is asserts what immediately follows is not “ dollars”, I expected var2 for the first record “100 dollars” returns null, but instead it returns 10, what should I use if I don’t want any value returned for the first record?

data check;

var='100 dollars'; output;

var='100 pesos'; output;

var='USD100'; output;

var='JPY100'; output;

var='JPYbdl100'; output;

run;

data check2;

set check;

*** create the regular expression only once ***;

retain re1;

if _N_=1 then do;

re1 = prxparse('/\d+(?! dollars)/');

end;

if prxmatch(re1,var) then do;

var2 = prxposn(re1,0,var);

end;

run;

proc print data=check2; run;

Accepted Solutions
Solution
‎03-27-2015 10:05 AM
Super User
Posts: 10,784

You really make me headache . Hope you don't post such question again .

```data check;
var="Patient ab ht reported .1 headache 1.3f and nausea. MD ods noticed rash."; output;
var="ab ht 2.2 Pt. Rptd. Backache. ht usd2.5 od"; output;
var="2.5h of ods patient reported seeing spots."; output;
var="1.3g Elevated pulse ab ht and 0.6d labored breathing."; output;
var="ab ht .3 5 4 .5k Headache ab";output;
run;

data want;
set check;
length v \$ 20;
retain pid;
var=prxchange('s/\bab\s+ht\b|\bods\b/|/i',-1,var);
if _n_ eq 1 then pid=prxparse('/[^|]\s+(\S+)?\s+(\S+)?\s+[dh-z]+(\d+)?\.?\d+\s+|\s+(\d+)?\.?\d+[abe-z]+\s+(\S+)?\s+(\S+)?\s+[^|]/i');
call prxsubstr(pid, var, position, length);
if position ne 0 then v = prxchange('s/^\.+(?=\d+\.\d+)|\.+\$//' ,-1, compress(substr(var, position, length),'.','kd') );
drop pid position length;
run;

```

Xia Keshan

All Replies
Contributor
Posts: 40

I can't specify 100 because it could be any number even decimals.

Super User
Posts: 9,599

Hi,

Am not sure what you are trying to do.  Is it just to get the number part of var?  If so then compress with alphas option should work fine:

http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000212246.htm

Contributor
Posts: 40

Let me give a better example:

data check;

var="Patient reported 100 c headache 3f and nausea. MD noticed rash."; output;

var="a2 Pt. Rptd. Backache. usd25"; output;

var="b25h of patient reported seeing spots."; output;

var="3g Elevated pulse and 6d labored breathing."; output;

run;

Yes, trying to get number part of var, but only if it meets the following conditions:

What proceeds is not a or b, example a2 or b25 is not wanted.

What follows is not c or d, example 100 c or 6d is not wanted.

Super User
Posts: 23,763

What are you trying to do?

Posts: 3,167

For you example as is, you need to specify the word boundary for '100', otherwise, '0' is not 'dollars', so '10' meets your criteria and is selected.

data check;

var='100 dollars'; output;

var='100 pesos'; output;

var='USD100'; output;

var='JPY100'; output;

var='JPYbdl100'; output;

run;

data check2;

set check;

*** create the regular expression only once ***;

retain re1;

if _N_=1 then do;

re1 = prxparse('/\d+\b(?! dollars)/');

end;

if prxmatch(re1,var) then do;

var2 = prxposn(re1,0,var);

end;

run;

Contributor
Posts: 40

Thanks Hai.kuo. boundary seams working for the ones with spaces, will miss the ones without space. any other suggestions?

Posts: 3,167

data check;

var='100 dollars'; output;

var='100dollars'; output;

var='100 pesos'; output;

var='USD100'; output;

var='JPY100'; output;

var='JPYbdl100'; output;

run;

data check2;

set check;

*** create the regular expression only once ***;

retain re1;

if _N_=1 then do;

re1 = prxparse('/\d+\b(?! dollars)/');

end;

if prxmatch(re1,var) then do;

var2 = prxposn(re1,0,var);

end;

run;

Contributor
Posts: 40

Ok, thanks, you are right. I expanded it a little to include look behind, now how come JPY100 and JPYbd1100 don't return anything? any idea?

data check;

var='100 dollars'; output;

var='100dollars'; output;

var='100 pesos'; output;

var='USD100'; output;

var='USD 100'; output;

var='JPY100'; output;

var='JPYbdl100'; output;

run;

run;

data check2;

set check;

*** create the regular expression only once ***;

retain re1;

if _N_=1 then do;

re1 = prxparse('/(?<!D )\b\d+\b(?! dollars)/');

end;

if prxmatch(re1,var) then do;

var2 = prxposn(re1,0,var);

end;

run;

proc print data=check2; run;

Posts: 3,167

It is indeed not that easy, The core of your problem is to 'negatively' identify an specific char or string, as it may sound simple , but it actually has two layers while PRX can only handle one at a time. One layer is that you want it to be an Alphabet, another layer is that you don't want it to be certain Alphabet. The following code may not cover all of your ground, but it seems working on your presented data as is:

data check;

var='100 dollars'; output;

var='100dollars'; output;

var='100 pesos'; output;

var='USD100'; output;

var='USD 100'; output;

var='JPY100'; output;

var='JPYbdl100'; output;

run;

data check2;

set check;

if prxmatch('m/usd|dollar/io',var)=0 then

var2=prxchange('s/(^\D*|\D*\$)//io',-1,var);

run;

proc print data=check2; run;

Super User
Posts: 10,784

If I understand what you mean.

data check;

var='100 dollars'; output;

var='100dollars'; output;

var='100 pesos'; output;

var='USD100'; output;

var='USD 100'; output;

var='JPY100'; output;

var='JPYbdl100'; output;

run;

data want;

set check;

length w \$ 20;

if not prxmatch('/(JPY|JPYbd)\s*\d+|\d+\s*(dollars)/i',var) then w=compress(var,,'kd');

run;

Contributor
Posts: 40

Sorry for the late reply. Thank you Hai.kuo and xia keshan. The codes both of you provided work only for this sample, won’t work for the other one I gave, which is closer to my real data:

data check;

var="Patient reported 100 c headache 3f and nausea. MD noticed rash."; output;

var="a2 Pt. Rptd. Backache. usd25"; output;

var="b25h of patient reported seeing spots."; output;

var="3g Elevated pulse and 6d labored breathing."; output;

run;

As you can see var could have other numbers in it, while I only want those not started with a or b, and not followed with c or d. Is there a way to do it?

Super User
Posts: 13,577

What do you want the output to look like for this data sample?

What would you expect for a data line like:

Pulse 110 BP 70 over 30

Contributor
Posts: 40

I should return this:

var                                                                                                      var2

 Patient reported 100 c headache 3f and nausea. MD noticed rash. 3 a2 Pt. Rptd. Backache. usd25 25 b25h of patient reported seeing spots. 3g Elevated pulse and 6d labored breathing. 3 Headache.

Thanks.

Contributor
Posts: 40