Hi, I want to create some new binary variables based on whether part of the original variables of each observation has pattern matched values. (If anybody is familiar, I use the MedPAR data.) Each new variable has its own corresponding value. So I try to code this out with loop and arrary.
It works out when there is only one loop (create only one new variable).
data array1;
set original;
exp1='/^I50/o';
exp2='/^I49/o';
exp3='/^I21/o';
exp4='/^I63/o';
exp5='/^R06|^J95|^J1[345678]/o';
array exp [*] exp1-exp5; /*the expression to find and match to create each new variable*/
array var [*] var1-var5;
array dgns_array [*] AD_DGNS DGNS_CD01-DGNS_CD25; /*the original variables to scan and check*/
var[2]=0; /*set the default value of the new variable*/
do i=1 to 26;
if prxmatch(exp[16], dgns_array[i])>0 then
do;
var[2]=1;
leave;
end;
end;
run;
However, when I write the nested loop,all the new variables have the same value as the first one.
data array2; set original; exp1='/^I50/o'; exp2='/^I49/o'; exp3='/^I21/o'; exp4='/^I63/o'; exp5='/^R06|^J95|^J1[345678]/o'; array exp [*] exp1-exp5; /*the expression to find and match to create each new variable*/ array var [*] var1-var5; array dgns_array [*] AD_DGNS DGNS_CD01-DGNS_CD25; /*the original variables to scan and check*/
do n=1 to dim(var);
var[n]=0;
end;
/*set the default value of the new variable*/
do j=1 to dim(var);
do i=1 to 26;
if prxmatch(exp[j], dgns_array[i])>0 then
do;
var[j]=1;
leave;
end;
end;
end; run;
I have no idea why this didn't work. Any hint or advice of other functions to do the same thing will be appreciated.
Thanks!
Hi @rsjjj and welcome to the SAS Support Communities (as a first-time poster)!
The simplified example below works ...
data have;
input (d1-d3) (:$4.);
cards;
I250 I350 I450
I390 I490 I590
I500 I600 I700
I470 I480 I490
;
data want(drop=i j n);
set have;
array exp[2] $25 ('/^I50/' '/^I49/');
array var[2];
array d[3];
do n=1 to dim(var);
var[n]=0;
end;
do j=1 to dim(var);
do i=1 to dim(d);
if prxmatch(exp[j], d[i]) then do;
var[j]=1;
leave;
end;
end;
end;
run;
... but it fails in the way you've described when I insert the "o" option into the regular expressions ('/^I50/o' '/^I49/o').
Conclusion: Remove the "o" option from your regular expressions.
Edit:
Explanation: With the "compile-once" behavior triggered by the "o" option (see Compiling a Perl Regular Expression) the regex compiled in the first call of the PRXMATCH function, i.e., '/^I50/o', is used in all subsequent calls of that function in the nested DO loops. Therefore, the pattern '/^I49/' is not searched for in observations 2 and 4 (where it is present), resulting in VAR2=0, and in the third observation (where '/^I50/' is found) not only VAR1 is set to 1, but also VAR2.
Hi @rsjjj and welcome to the SAS Support Communities (as a first-time poster)!
The simplified example below works ...
data have;
input (d1-d3) (:$4.);
cards;
I250 I350 I450
I390 I490 I590
I500 I600 I700
I470 I480 I490
;
data want(drop=i j n);
set have;
array exp[2] $25 ('/^I50/' '/^I49/');
array var[2];
array d[3];
do n=1 to dim(var);
var[n]=0;
end;
do j=1 to dim(var);
do i=1 to dim(d);
if prxmatch(exp[j], d[i]) then do;
var[j]=1;
leave;
end;
end;
end;
run;
... but it fails in the way you've described when I insert the "o" option into the regular expressions ('/^I50/o' '/^I49/o').
Conclusion: Remove the "o" option from your regular expressions.
Edit:
Explanation: With the "compile-once" behavior triggered by the "o" option (see Compiling a Perl Regular Expression) the regex compiled in the first call of the PRXMATCH function, i.e., '/^I50/o', is used in all subsequent calls of that function in the nested DO loops. Therefore, the pattern '/^I49/' is not searched for in observations 2 and 4 (where it is present), resulting in VAR2=0, and in the third observation (where '/^I50/' is found) not only VAR1 is set to 1, but also VAR2.
Adding to @FreelanceReinh suggestion, you can compile only once by calling prxParse first (only once) and then call prxMatch with the pattern IDs:
ata want(drop=i j n);
set have;
array exp[2] $25 _temporary_ ('/^I50/' '/^I49/');
array rId {2} _temporary_;
array var[2];
array d[3];
if _n_ = 1 then do;
do n = 1 to dim(rId);
rId{n} = prxParse(exp{n});
end;
end;
do n=1 to dim(var);
var[n]=0;
end;
do j=1 to dim(var);
do i=1 to dim(d);
if prxmatch(rId[j], d[i]) then do;
var[j]=1;
leave;
end;
end;
end;
run;
Why are you using REGEX for something that normal operators and functions can handle?
%let exp1='I50';
%let exp2='I49';
%let exp3='I21';
%let exp4='I63';
%let exp5='R06' 'J95' 'J13' 'J14' 'J15' 'J16' 'J17' 'J18' ;
%let n=5;
data array2;
set original;
array dgns_array [*] AD_DGNS DGNS_CD01-DGNS_CD25;
array var [&n] ;
%do j=1 %to &n ;
do i=1 to dim(dgns_array) until (var[&j]);
var[&j] = dgns_array[i] in: (&&exp&j);
end;
%end;
run;
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.