BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
rsjjj
Fluorite | Level 6

Hi, I want to create some new binary variables based on whether part of the original variables of each observation has pattern matched values. (If anybody is familiar, I use the MedPAR data.) Each new variable has its own corresponding value. So I try to code this out with loop and arrary.

 

It works out when there is only one loop (create only one new variable).

data array1;
set original;

exp1='/^I50/o'; exp2='/^I49/o'; exp3='/^I21/o'; exp4='/^I63/o'; exp5='/^R06|^J95|^J1[345678]/o'; array exp [*] exp1-exp5; /*the expression to find and match to create each new variable*/ array var [*] var1-var5; array dgns_array [*] AD_DGNS DGNS_CD01-DGNS_CD25; /*the original variables to scan and check*/
var[2]=0; /*set the default value of the new variable*/
do i=1 to 26; if prxmatch(exp[16], dgns_array[i])>0 then do; var[2]=1; leave; end; end; run;

 

 

However, when I write the nested loop,all the new variables have the same value as the first one.

 

data array2;
set original;

    exp1='/^I50/o';
    exp2='/^I49/o';
    exp3='/^I21/o';
    exp4='/^I63/o';
    exp5='/^R06|^J95|^J1[345678]/o';
	
    array exp [*] exp1-exp5; /*the expression to find and match to create each new variable*/
    array var [*] var1-var5;
    array dgns_array [*] AD_DGNS DGNS_CD01-DGNS_CD25;  /*the original variables to scan and check*/ 
    
do n=1 to dim(var);
    var[n]=0;
    end;
/*set the default value of the new variable*/
     
    do j=1 to dim(var);
       do i=1 to 26;
         if  prxmatch(exp[j], dgns_array[i])>0 then
do;
var[j]=1;
leave;
            end;
end;
    end; run;

2022-03-02_202958.PNG

I have no idea why this didn't work. Any hint or advice of other functions to do the same thing will be appreciated.

Thanks!

 

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hi @rsjjj and welcome to the SAS Support Communities (as a first-time poster)!

 

The simplified example below works ...

data have;
input (d1-d3) (:$4.);
cards;
I250 I350 I450
I390 I490 I590
I500 I600 I700
I470 I480 I490
;

data want(drop=i j n);
set have;
array exp[2] $25 ('/^I50/' '/^I49/');
array var[2];
array d[3];

do n=1 to dim(var);
  var[n]=0;
end;
  
do j=1 to dim(var);
  do i=1 to dim(d);
    if prxmatch(exp[j], d[i]) then do;
      var[j]=1;
      leave;
    end;
  end;
end;
run;

... but it fails in the way you've described when I insert the "o" option into the regular expressions ('/^I50/o' '/^I49/o').

 

Conclusion: Remove the "o" option from your regular expressions.

 

Edit:

Explanation: With the "compile-once" behavior triggered by the "o" option (see Compiling a Perl Regular Expression) the regex compiled in the first call of the PRXMATCH function, i.e., '/^I50/o', is used in all subsequent calls of that function in the nested DO loops. Therefore, the pattern '/^I49/' is not searched for in observations 2 and 4 (where it is present), resulting in VAR2=0, and in the third observation (where '/^I50/' is found) not only VAR1 is set to 1, but also VAR2.

View solution in original post

6 REPLIES 6
FreelanceReinh
Jade | Level 19

Hi @rsjjj and welcome to the SAS Support Communities (as a first-time poster)!

 

The simplified example below works ...

data have;
input (d1-d3) (:$4.);
cards;
I250 I350 I450
I390 I490 I590
I500 I600 I700
I470 I480 I490
;

data want(drop=i j n);
set have;
array exp[2] $25 ('/^I50/' '/^I49/');
array var[2];
array d[3];

do n=1 to dim(var);
  var[n]=0;
end;
  
do j=1 to dim(var);
  do i=1 to dim(d);
    if prxmatch(exp[j], d[i]) then do;
      var[j]=1;
      leave;
    end;
  end;
end;
run;

... but it fails in the way you've described when I insert the "o" option into the regular expressions ('/^I50/o' '/^I49/o').

 

Conclusion: Remove the "o" option from your regular expressions.

 

Edit:

Explanation: With the "compile-once" behavior triggered by the "o" option (see Compiling a Perl Regular Expression) the regex compiled in the first call of the PRXMATCH function, i.e., '/^I50/o', is used in all subsequent calls of that function in the nested DO loops. Therefore, the pattern '/^I49/' is not searched for in observations 2 and 4 (where it is present), resulting in VAR2=0, and in the third observation (where '/^I50/' is found) not only VAR1 is set to 1, but also VAR2.

rsjjj
Fluorite | Level 6
Thank you so much! I've been stuck with this for several days but it didn't come to me that the issue is the "o" option. Trying to save the running time by using it at the beginning....
PGStats
Opal | Level 21

Adding to @FreelanceReinh suggestion, you can compile only once by calling prxParse first (only once) and then call prxMatch with the pattern IDs:

 

ata want(drop=i j n);
set have;
array exp[2] $25 _temporary_ ('/^I50/' '/^I49/');
array rId {2} _temporary_;
array var[2];
array d[3];

if _n_ = 1 then do;
    do n = 1 to dim(rId);
        rId{n} = prxParse(exp{n});
        end;
    end;
    
do n=1 to dim(var);
    var[n]=0;
    end;
  
do j=1 to dim(var);
      do i=1 to dim(d);
            if prxmatch(rId[j], d[i]) then do;
                  var[j]=1;
                  leave;
                  end;
            end;
      end;
run;

 

PG
rsjjj
Fluorite | Level 6
Thanks! That's exactly what I tried to achieve at first
Tom
Super User Tom
Super User

Why are you using REGEX for something that normal operators and functions can handle?

 

%let exp1='I50';
%let exp2='I49';
%let exp3='I21';
%let exp4='I63';
%let exp5='R06' 'J95' 'J13' 'J14' 'J15' 'J16' 'J17' 'J18' ;
%let n=5;

data array2;
  set original;
  array dgns_array [*] AD_DGNS DGNS_CD01-DGNS_CD25; 
  array var [&n] ;
%do j=1 %to &n ;
  do i=1 to dim(dgns_array) until (var[&j]);
    var[&j] = dgns_array[i] in: (&&exp&j);
  end;
%end;
run;
rsjjj
Fluorite | Level 6
It's really an efficient method! Thanks for the advice

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 947 views
  • 6 likes
  • 4 in conversation