DATA Step, Macro, Functions and more

Using a table to redact info from a string in another table

Reply
Occasional Contributor
Posts: 17

Using a table to redact info from a string in another table

Hi All,

 

I am attemping to use a list in one table to compare and replace text within a string from another table. I can't seem to sort out how to find and replace that text without using a macro variable. Since I'm limited to the size, is it possible to reference a table instead of the macro variable?

 

data want;
 set have;
 F1 = prxchange("s/&SEARCH_LST/*REDACTED*/i", -1, F1);
   put F1=;
run;
PROC Star
Posts: 307

Re: Using a table to redact info from a string in another table

Posted in reply to chrishull

How many items are there in your list of items to redact? And what does your macro variable approach that isn't working look like? Is it anything like the below?

 

proc sql noprint;
SELECT strip(redact_val) INTO :SEARCH_LST SEPARATED BY '|'
FROM redact_list_table;
quit;
Occasional Contributor
Posts: 17

Re: Using a table to redact info from a string in another table

Posted in reply to collinelliot

Yes that's pretty much what I have. Didn't think of the strip though but my "test" field is 6 characters in length. 

PROC Star
Posts: 7,467

Re: Using a table to redact info from a string in another table

Posted in reply to chrishull

Good question!  I'm not sure how many characters can be included in a prxchange search list.

 

Since you don't think you can use a macro variable, I presume that your collection of search words/phrases total more than 64,000 characters.

 

If the length of the prxchange search list is unlimited, then here is one approach you could take .. use a data step to write the program to a call execute run.  I.e., to write the part of the code prior to the prxchange statement, but including:

F1 = prxchange("s/

then bring in your file of words/phrases and, for the first string, use a call execute statement to write the string. 

Then, using a do loop, read the rest of the file and, for each word/phrase, use the catt function to write out the rest of the words/phrases (e.g., catt('|',wordphrase).

Finally send a final call execute statement, namely:

 

/*REDACTED*/i", -1, F1);
   put F1=;
run;

HTH,

Art, CEO, AnalystFinder.com

 

 

Occasional Contributor
Posts: 17

Re: Using a table to redact info from a string in another table

Thank you for your input. Yes my search word list is much more than 64,000 characters.

 

For this approach, I think I will have to count the number of characters first before I load into the macro variable to make sure that I'm not truncating any of my words/phrases.

 

I'll give this a try and see how well it works. I'll have to schedule it to run overnight since I can only assume that it will take a while to process.

 

I'll report back when it's complete.

PROC Star
Posts: 7,467

Re: Using a table to redact info from a string in another table

Posted in reply to chrishull

Chris,

 

I'm very interested to see if the call execute, rather than the macro, approach would work. If you're only going to test the macro approach, I'd appreciate having a chance to test it.

 

If the data isn't confidential, please send me the files to: art@analystfinder.com

 

Thanks,

Art, CEO, AnalystFinder.com

Occasional Contributor
Posts: 17

Re: Using a table to redact info from a string in another table

Hi Art,

 

The data is confidential unfortunately. I apologize for not picking up on your comment about the call execute. I'm not that familiar with that function. 


Would you reccomend that I loop through each word/phrase in my list and compare/replace?

 

Chris

Super User
Posts: 10,018

Re: Using a table to redact info from a string in another table

Posted in reply to chrishull

Can you split the whole row into tokens and use hash table or sql to replace them ?

 

1 xxx yyy zzz

Change it into 

1 xxx

1 yyy

1 zzz

Occasional Contributor
Posts: 17

Re: Using a table to redact info from a string in another table

I was thinking of trying that but I have names that have varying lengths. The numeric part I could. That said, the name list is much longer than the numeric one.

Trusted Advisor
Posts: 1,018

Re: Using a table to redact info from a string in another table

Posted in reply to chrishull

How much longer than a max-length macro value is it?  If it's only a multiple of 2 or 3, you could just run the prxchange function two or three times, each with a unique macrovar.

 

 

Occasional Contributor
Posts: 17

Re: Using a table to redact info from a string in another table

My first list is 37,000+ unique 6 digit numbers (~260,000 with PIPEs). The word/phrase list would be much longer... Millions of characters.

PROC Star
Posts: 7,467

Re: Using a table to redact info from a string in another table

[ Edited ]
Posted in reply to chrishull

Important: Disregard this post. It attempts to use the cats function which, by definition, will only allow 32,767 characters.  I'm working on an alternative.  Art

------------

Chris,

 

Here is an example of the method I was suggesting. The code is a bit odd, as SAS added in unexpected quotation marks, but the code still works. Like I mentioned, I have no idea if the PRXCHANGE function has limits regarding how many entries it will accept, but I think it's worth a try:

 

data have;
  informat F1 $50.;
  input F1 &;
  cards;
when Art went to play
then Mary went to play
and John went along for the ride
but Judy decided to stay
;

data phrases;
  informat phrase $50.;
  input phrase &;
  cards;
Art went
Mary
John went
;

data _null_;
  set phrases end=eof;
  length toexec $50;
  if _n_ eq 1 then do;
    call execute('data want;set have;F1 = prxchange(cats("s/(REDACTED"');
  end;
  toexec=cats(',',quote(cats('|',phrase)));
  call execute(toexec);
  if eof then call execute(',")/ *REDACTED* /i"), -1, F1);run;');
run;

Art, CEO, AnalystFinder.com

 

 

 

PROC Star
Posts: 7,467

Re: Using a table to redact info from a string in another table

Posted in reply to chrishull

Chris,

 

I got this to work on a phrase file that included over 180,000 charaters. I think it will run regardless of how big your phrase file is:

 

data have;
  informat F1 $50.;
  input F1 &;
  cards;
when Art wente to play
then Mary have to play
and John went along for the ride
but Judy decided to stay
;

data phrases;
  informat phrase $50.;
  input phrase &;
  do i=1 to 20000;
    output;
  end;
  cards;
Art wente
Mary have
John went
;

data _null_;
  set phrases end=eof;
  length toexec $50;
  file '/folders/myfolders/for_include.sas';
  if _n_ eq 1 then start=1;
  if start eq 1 then do;
    if _n_ eq 1 then do;
      toexec=cats('data want; set have;','0D'x);
      put toexec;
    end;
    toexec=cats('F1 = prxchange("s/(',phrase,'0D'x);
    start=length(toexec);
  end;
  else do;
    toexec=cats('|',phrase,'0D'x);
    start+length(toexec);
  end;
  put toexec;
  if eof or start gt 200 then do;
    toexec=cats(')/ *REDACTED* /io", -1, F1);','0D'x);
    put toexec;
    start=1;
    if eof then do;
      toexec=cats('run;','0D'x);
      put toexec;
    end;
  end;
run;

%include '/folders/myfolders/for_include.sas';

Art, CEO, AnalystFinder.com

 

PROC Star
Posts: 1,759

Re: Using a table to redact info from a string in another table

[ Edited ]

I reckon regular expressions are too expensive and unneeded for such simple string swaps.

 

data PHRASES;
  length PHRASE $32000.;
  do OBS=1 to 5000;
    PHRASE=' ';
    do WORD=1 to 50;
      PHRASE=catx(' ',PHRASE,int(ranuni(0)*1e7));
    end;
    output;
  end;
  keep PHRASE;
run;

data SEARCH_LIST;
  length REDACT $8.;
  set PHRASES;
  if ranuni(0) >.95;
  REDACT=scan(PHRASE,1);N=_N_;
  keep REDACT N;
run;
         
data _null_;   * Use one RegEx by value: 40 seconds;
  set SEARCH_LIST end=LASTOBS;
  if _N_ eq 1 then call execute('data WANT1; set PHRASES;');
  call execute(catt('PHRASE=prxchange("s/', REDACT, '/*REDACTED*/io",-1,PHRASE);'));
  if LASTOBS then call execute('run;');
run;
    
data _null_;    * Use one RegEx for 100 values: 350 seconds;
  length LIST $32000;
  if _N_ eq 1 then call execute('data WANT2; set PHRASES;');
  WORDS=0;
  do until(LASTOBS or WORDS=100);                              
    set SEARCH_LIST end=LASTOBS;
    LIST=catx('|',LIST,REDACT);
    WORDS+1;
  end;
  call execute(catt('PHRASE=prxchange("s/', LIST, '/*REDACTED*/io",-1,PHRASE);'));
  if LASTOBS then call execute('run;');
run;

data _null_;  * Use tranwrd() function : 6 seconds;
  set SEARCH_LIST end=LASTOBS;
  if _N_ eq 1 then call execute('data WANT3; set PHRASES;');
  call execute(catt('PHRASE=tranwrd(PHRASE,"', REDACT, '","*REDACTED*");'));
  if LASTOBS then call execute('run;');
run;
                                             

 

 

 

 

PROC Star
Posts: 1,759

Re: Using a table to redact info from a string in another table

My test variable was very long.

 

If yours is shorter, regular expressions are not too costly, and provide a match regardless of case.

Just trimming the string to match makes a big difference.

 


data _null_;   * Use one RegEx by value: 4 seconds;
  set SEARCH_LIST end=LASTOBS;
  if _N_ eq 1 then call execute('data WANT1; set PHRASES;');
  call execute(catt('PHRASE=prxchange("s/', REDACT, '/*REDACTED*/io",-1,trim(PHRASE));'));
  if LASTOBS then call execute('run;');
run;
    
data _null_;    * Use one RegEx for 100 values: 5 seconds;
  length LIST $32000;
  if _N_ eq 1 then call execute('data WANT2; set PHRASES;');
  WORDS=0;
  do until(LASTOBS or WORDS=100);                              
    set SEARCH_LIST end=LASTOBS;
    LIST=catx('|',LIST,REDACT);
    WORDS+1;
  end;
  call execute(catt('PHRASE=prxchange("s/', LIST, '/*REDACTED*/io",-1,trim(PHRASE));'));
  if LASTOBS then call execute('run;');
run;  

data _null_;  * Use tranwrd() function : 3 seconds;
  set SEARCH_LIST end=LASTOBS;
  if _N_ eq 1 then call execute('data WANT3; set PHRASES;');
  call execute(catt('PHRASE=tranwrd(trim(PHRASE),"', REDACT, '","*REDACTED*");'));
  if LASTOBS then call execute('run;');
run;
          

 

 

 

 

Ask a Question
Discussion stats
  • 15 replies
  • 212 views
  • 0 likes
  • 6 in conversation