- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi, I want to create some new binary variables based on whether part of the original variables of each observation has pattern matched values. (If anybody is familiar, I use the MedPAR data.) Each new variable has its own corresponding value. So I try to code this out with loop and arrary.
It works out when there is only one loop (create only one new variable).
data array1;
set original;
exp1='/^I50/o';
exp2='/^I49/o';
exp3='/^I21/o';
exp4='/^I63/o';
exp5='/^R06|^J95|^J1[345678]/o';
array exp [*] exp1-exp5; /*the expression to find and match to create each new variable*/
array var [*] var1-var5;
array dgns_array [*] AD_DGNS DGNS_CD01-DGNS_CD25; /*the original variables to scan and check*/
var[2]=0; /*set the default value of the new variable*/
do i=1 to 26;
if prxmatch(exp[16], dgns_array[i])>0 then
do;
var[2]=1;
leave;
end;
end;
run;
However, when I write the nested loop,all the new variables have the same value as the first one.
data array2; set original; exp1='/^I50/o'; exp2='/^I49/o'; exp3='/^I21/o'; exp4='/^I63/o'; exp5='/^R06|^J95|^J1[345678]/o'; array exp [*] exp1-exp5; /*the expression to find and match to create each new variable*/ array var [*] var1-var5; array dgns_array [*] AD_DGNS DGNS_CD01-DGNS_CD25; /*the original variables to scan and check*/
do n=1 to dim(var);
var[n]=0;
end;
/*set the default value of the new variable*/
do j=1 to dim(var);
do i=1 to 26;
if prxmatch(exp[j], dgns_array[i])>0 then
do;
var[j]=1;
leave;
end;
end;
end; run;
I have no idea why this didn't work. Any hint or advice of other functions to do the same thing will be appreciated.
Thanks!
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi @rsjjj and welcome to the SAS Support Communities (as a first-time poster)!
The simplified example below works ...
data have;
input (d1-d3) (:$4.);
cards;
I250 I350 I450
I390 I490 I590
I500 I600 I700
I470 I480 I490
;
data want(drop=i j n);
set have;
array exp[2] $25 ('/^I50/' '/^I49/');
array var[2];
array d[3];
do n=1 to dim(var);
var[n]=0;
end;
do j=1 to dim(var);
do i=1 to dim(d);
if prxmatch(exp[j], d[i]) then do;
var[j]=1;
leave;
end;
end;
end;
run;
... but it fails in the way you've described when I insert the "o" option into the regular expressions ('/^I50/o' '/^I49/o').
Conclusion: Remove the "o" option from your regular expressions.
Edit:
Explanation: With the "compile-once" behavior triggered by the "o" option (see Compiling a Perl Regular Expression) the regex compiled in the first call of the PRXMATCH function, i.e., '/^I50/o', is used in all subsequent calls of that function in the nested DO loops. Therefore, the pattern '/^I49/' is not searched for in observations 2 and 4 (where it is present), resulting in VAR2=0, and in the third observation (where '/^I50/' is found) not only VAR1 is set to 1, but also VAR2.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi @rsjjj and welcome to the SAS Support Communities (as a first-time poster)!
The simplified example below works ...
data have;
input (d1-d3) (:$4.);
cards;
I250 I350 I450
I390 I490 I590
I500 I600 I700
I470 I480 I490
;
data want(drop=i j n);
set have;
array exp[2] $25 ('/^I50/' '/^I49/');
array var[2];
array d[3];
do n=1 to dim(var);
var[n]=0;
end;
do j=1 to dim(var);
do i=1 to dim(d);
if prxmatch(exp[j], d[i]) then do;
var[j]=1;
leave;
end;
end;
end;
run;
... but it fails in the way you've described when I insert the "o" option into the regular expressions ('/^I50/o' '/^I49/o').
Conclusion: Remove the "o" option from your regular expressions.
Edit:
Explanation: With the "compile-once" behavior triggered by the "o" option (see Compiling a Perl Regular Expression) the regex compiled in the first call of the PRXMATCH function, i.e., '/^I50/o', is used in all subsequent calls of that function in the nested DO loops. Therefore, the pattern '/^I49/' is not searched for in observations 2 and 4 (where it is present), resulting in VAR2=0, and in the third observation (where '/^I50/' is found) not only VAR1 is set to 1, but also VAR2.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Adding to @FreelanceReinh suggestion, you can compile only once by calling prxParse first (only once) and then call prxMatch with the pattern IDs:
ata want(drop=i j n);
set have;
array exp[2] $25 _temporary_ ('/^I50/' '/^I49/');
array rId {2} _temporary_;
array var[2];
array d[3];
if _n_ = 1 then do;
do n = 1 to dim(rId);
rId{n} = prxParse(exp{n});
end;
end;
do n=1 to dim(var);
var[n]=0;
end;
do j=1 to dim(var);
do i=1 to dim(d);
if prxmatch(rId[j], d[i]) then do;
var[j]=1;
leave;
end;
end;
end;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Why are you using REGEX for something that normal operators and functions can handle?
%let exp1='I50';
%let exp2='I49';
%let exp3='I21';
%let exp4='I63';
%let exp5='R06' 'J95' 'J13' 'J14' 'J15' 'J16' 'J17' 'J18' ;
%let n=5;
data array2;
set original;
array dgns_array [*] AD_DGNS DGNS_CD01-DGNS_CD25;
array var [&n] ;
%do j=1 %to &n ;
do i=1 to dim(dgns_array) until (var[&j]);
var[&j] = dgns_array[i] in: (&&exp&j);
end;
%end;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content