BookmarkSubscribeRSS Feed
chrishull
Obsidian | Level 7

Hi All,

 

I am attemping to use a list in one table to compare and replace text within a string from another table. I can't seem to sort out how to find and replace that text without using a macro variable. Since I'm limited to the size, is it possible to reference a table instead of the macro variable?

 

data want;
 set have;
 F1 = prxchange("s/&SEARCH_LST/*REDACTED*/i", -1, F1);
   put F1=;
run;
15 REPLIES 15
collinelliot
Barite | Level 11

How many items are there in your list of items to redact? And what does your macro variable approach that isn't working look like? Is it anything like the below?

 

proc sql noprint;
SELECT strip(redact_val) INTO :SEARCH_LST SEPARATED BY '|'
FROM redact_list_table;
quit;
chrishull
Obsidian | Level 7

Yes that's pretty much what I have. Didn't think of the strip though but my "test" field is 6 characters in length. 

art297
Opal | Level 21

Good question!  I'm not sure how many characters can be included in a prxchange search list.

 

Since you don't think you can use a macro variable, I presume that your collection of search words/phrases total more than 64,000 characters.

 

If the length of the prxchange search list is unlimited, then here is one approach you could take .. use a data step to write the program to a call execute run.  I.e., to write the part of the code prior to the prxchange statement, but including:

F1 = prxchange("s/

then bring in your file of words/phrases and, for the first string, use a call execute statement to write the string. 

Then, using a do loop, read the rest of the file and, for each word/phrase, use the catt function to write out the rest of the words/phrases (e.g., catt('|',wordphrase).

Finally send a final call execute statement, namely:

 

/*REDACTED*/i", -1, F1);
   put F1=;
run;

HTH,

Art, CEO, AnalystFinder.com

 

 

chrishull
Obsidian | Level 7

Thank you for your input. Yes my search word list is much more than 64,000 characters.

 

For this approach, I think I will have to count the number of characters first before I load into the macro variable to make sure that I'm not truncating any of my words/phrases.

 

I'll give this a try and see how well it works. I'll have to schedule it to run overnight since I can only assume that it will take a while to process.

 

I'll report back when it's complete.

art297
Opal | Level 21

Chris,

 

I'm very interested to see if the call execute, rather than the macro, approach would work. If you're only going to test the macro approach, I'd appreciate having a chance to test it.

 

If the data isn't confidential, please send me the files to: art@analystfinder.com

 

Thanks,

Art, CEO, AnalystFinder.com

chrishull
Obsidian | Level 7

Hi Art,

 

The data is confidential unfortunately. I apologize for not picking up on your comment about the call execute. I'm not that familiar with that function. 


Would you reccomend that I loop through each word/phrase in my list and compare/replace?

 

Chris

Ksharp
Super User

Can you split the whole row into tokens and use hash table or sql to replace them ?

 

1 xxx yyy zzz

Change it into 

1 xxx

1 yyy

1 zzz

chrishull
Obsidian | Level 7

I was thinking of trying that but I have names that have varying lengths. The numeric part I could. That said, the name list is much longer than the numeric one.

mkeintz
PROC Star

How much longer than a max-length macro value is it?  If it's only a multiple of 2 or 3, you could just run the prxchange function two or three times, each with a unique macrovar.

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
chrishull
Obsidian | Level 7

My first list is 37,000+ unique 6 digit numbers (~260,000 with PIPEs). The word/phrase list would be much longer... Millions of characters.

art297
Opal | Level 21

Important: Disregard this post. It attempts to use the cats function which, by definition, will only allow 32,767 characters.  I'm working on an alternative.  Art

------------

Chris,

 

Here is an example of the method I was suggesting. The code is a bit odd, as SAS added in unexpected quotation marks, but the code still works. Like I mentioned, I have no idea if the PRXCHANGE function has limits regarding how many entries it will accept, but I think it's worth a try:

 

data have;
  informat F1 $50.;
  input F1 &;
  cards;
when Art went to play
then Mary went to play
and John went along for the ride
but Judy decided to stay
;

data phrases;
  informat phrase $50.;
  input phrase &;
  cards;
Art went
Mary
John went
;

data _null_;
  set phrases end=eof;
  length toexec $50;
  if _n_ eq 1 then do;
    call execute('data want;set have;F1 = prxchange(cats("s/(REDACTED"');
  end;
  toexec=cats(',',quote(cats('|',phrase)));
  call execute(toexec);
  if eof then call execute(',")/ *REDACTED* /i"), -1, F1);run;');
run;

Art, CEO, AnalystFinder.com

 

 

 

art297
Opal | Level 21

Chris,

 

I got this to work on a phrase file that included over 180,000 charaters. I think it will run regardless of how big your phrase file is:

 

data have;
  informat F1 $50.;
  input F1 &;
  cards;
when Art wente to play
then Mary have to play
and John went along for the ride
but Judy decided to stay
;

data phrases;
  informat phrase $50.;
  input phrase &;
  do i=1 to 20000;
    output;
  end;
  cards;
Art wente
Mary have
John went
;

data _null_;
  set phrases end=eof;
  length toexec $50;
  file '/folders/myfolders/for_include.sas';
  if _n_ eq 1 then start=1;
  if start eq 1 then do;
    if _n_ eq 1 then do;
      toexec=cats('data want; set have;','0D'x);
      put toexec;
    end;
    toexec=cats('F1 = prxchange("s/(',phrase,'0D'x);
    start=length(toexec);
  end;
  else do;
    toexec=cats('|',phrase,'0D'x);
    start+length(toexec);
  end;
  put toexec;
  if eof or start gt 200 then do;
    toexec=cats(')/ *REDACTED* /io", -1, F1);','0D'x);
    put toexec;
    start=1;
    if eof then do;
      toexec=cats('run;','0D'x);
      put toexec;
    end;
  end;
run;

%include '/folders/myfolders/for_include.sas';

Art, CEO, AnalystFinder.com

 

ChrisNZ
Tourmaline | Level 20

I reckon regular expressions are too expensive and unneeded for such simple string swaps.

 

data PHRASES;
  length PHRASE $32000.;
  do OBS=1 to 5000;
    PHRASE=' ';
    do WORD=1 to 50;
      PHRASE=catx(' ',PHRASE,int(ranuni(0)*1e7));
    end;
    output;
  end;
  keep PHRASE;
run;

data SEARCH_LIST;
  length REDACT $8.;
  set PHRASES;
  if ranuni(0) >.95;
  REDACT=scan(PHRASE,1);N=_N_;
  keep REDACT N;
run;
         
data _null_;   * Use one RegEx by value: 40 seconds;
  set SEARCH_LIST end=LASTOBS;
  if _N_ eq 1 then call execute('data WANT1; set PHRASES;');
  call execute(catt('PHRASE=prxchange("s/', REDACT, '/*REDACTED*/io",-1,PHRASE);'));
  if LASTOBS then call execute('run;');
run;
    
data _null_;    * Use one RegEx for 100 values: 350 seconds;
  length LIST $32000;
  if _N_ eq 1 then call execute('data WANT2; set PHRASES;');
  WORDS=0;
  do until(LASTOBS or WORDS=100);                              
    set SEARCH_LIST end=LASTOBS;
    LIST=catx('|',LIST,REDACT);
    WORDS+1;
  end;
  call execute(catt('PHRASE=prxchange("s/', LIST, '/*REDACTED*/io",-1,PHRASE);'));
  if LASTOBS then call execute('run;');
run;

data _null_;  * Use tranwrd() function : 6 seconds;
  set SEARCH_LIST end=LASTOBS;
  if _N_ eq 1 then call execute('data WANT3; set PHRASES;');
  call execute(catt('PHRASE=tranwrd(PHRASE,"', REDACT, '","*REDACTED*");'));
  if LASTOBS then call execute('run;');
run;
                                             

 

 

 

 

ChrisNZ
Tourmaline | Level 20

My test variable was very long.

 

If yours is shorter, regular expressions are not too costly, and provide a match regardless of case.

Just trimming the string to match makes a big difference.

 


data _null_;   * Use one RegEx by value: 4 seconds;
  set SEARCH_LIST end=LASTOBS;
  if _N_ eq 1 then call execute('data WANT1; set PHRASES;');
  call execute(catt('PHRASE=prxchange("s/', REDACT, '/*REDACTED*/io",-1,trim(PHRASE));'));
  if LASTOBS then call execute('run;');
run;
    
data _null_;    * Use one RegEx for 100 values: 5 seconds;
  length LIST $32000;
  if _N_ eq 1 then call execute('data WANT2; set PHRASES;');
  WORDS=0;
  do until(LASTOBS or WORDS=100);                              
    set SEARCH_LIST end=LASTOBS;
    LIST=catx('|',LIST,REDACT);
    WORDS+1;
  end;
  call execute(catt('PHRASE=prxchange("s/', LIST, '/*REDACTED*/io",-1,trim(PHRASE));'));
  if LASTOBS then call execute('run;');
run;  

data _null_;  * Use tranwrd() function : 3 seconds;
  set SEARCH_LIST end=LASTOBS;
  if _N_ eq 1 then call execute('data WANT3; set PHRASES;');
  call execute(catt('PHRASE=tranwrd(trim(PHRASE),"', REDACT, '","*REDACTED*");'));
  if LASTOBS then call execute('run;');
run;
          

 

 

 

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 15 replies
  • 1692 views
  • 0 likes
  • 6 in conversation