BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
aj34321
Quartz | Level 8

Hi All,

I have this dataset and would like to remove the words defined in remove_list. Not sure how do i handle multiple words. If this was a single word each line with array I can remove but this one is kind of confusing me. Appreciate if someone can help me on this please. Thanks a lot...

 

data have;
input transaction & $100.;
cards;
VISA DEBIT PURCHASE CARD 8013 MERCHANT_A AREA1
VISA DEBIT PURCHASE CARD 8013 MERCHANT_B
VISA DEBIT PURCHASE CARD 4059 MERCHANT_X AREA10 2271
VISA DEBIT PURCHASE CARD 7005 MERCHANT_Y
VISA DEBIT PURCHASE CARD 6008 MERCHANT_Y AREA100 57898
VISA DEBIT PURCHASE CARD 4095 MERCHANT_X
VISA DEBIT PURCHASE CARD 4095 MERCHANT_Z AREA1 AREA1
VISA DEBIT PURCHASE CARD 4095 MERCHANT_Z MERCHANT_Z
;


data remove_list;
input rule_id & $100.;
cards;
VISA DEBIT PURCHASE CARD 8013
VISA DEBIT PURCHASE CARD 4059
VISA DEBIT PURCHASE CARD 7005
VISA DEBIT PURCHASE CARD 6008
VISA DEBIT PURCHASE CARD 4095
;

 

data want;
input transaction & $100.;
cards;
MERCHANT_A AREA1
MERCHANT_B
MERCHANT_X AREA10 2271
MERCHANT_Y 
MERCHANT_Y AREA100 57898
MERCHANT_X
MERCHANT_Z AREA1 AREA1 
MERCHANT_Z MERCHANT_Z
;

 

 

I found the below code from https://communities.sas.com/t5/New-SAS-User/How-do-I-remove-a-list-of-words-from-a-string-using-a-ma...

and tried to modify but the scan function in here looks for every single word with a space seperator but in my case the rule is to remove a line of 1+ words.

 

data md3;
if _n_=1 then do;
if 0 then set remove_list;
dcl hash H (dataset:'remove_list') ;
h.definekey ("rule_id") ;
h.definedone () ;
end;

set md2;
length description_3 $100;
do _n_= 1 to countw(description_2,' ');
if h.check(key:scan(description_2, _n_, ' ')) = 0 then continue;
description_3 = catx(' ',description_3, scan(description_2, _n_, ' '));
end;
 drop rule_id;
run;

 

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

Using the example in your earlier reply

 

data have;
input transaction & $100.;
cards;
INTERNET BANKING FUNDS TFER TRANSFER REF:NNN FROM A/C:NNNNNN
EFTPOS COSTCO WHOLESALE COUNTRY SUBURB COUNTRY
;


data remove_list;
input rule_id & $100.;
cards;
INTERNET BANKING FUNDS TFER
EFTPOS
COUNTRY SUBURB COUNTRY
;

proc sql;
    select distinct strip(rule_id) into :removes separated by ',' from remove_list;
quit;
%put &=removes;

data want;
    set have;
    do i=1 to countw("&removes",',');
        this_remove=scan("&removes",i,',');
        where=find(transaction,this_remove,'it');
        if where>0 then substr(transaction,where,length(this_remove))='';
        transaction=strip(transaction);
        if i=countw("&removes",',') then output;
    end;
    drop i this_remove where;
run;
--
Paige Miller

View solution in original post

8 REPLIES 8
PaigeMiller
Diamond | Level 26

It looks like you want to remove everything to the left of MERCHANT, which is simple to do. Is that a good summary of the problem? Or did you provide a poor example that doesn't illustrate the real problem?

--
Paige Miller
aj34321
Quartz | Level 8

Hi @PaigeMiller, yes you are partially right on the bad example. Its not that i need to trim everything before word MERCHANT, but what ever i define in the remove list should be removed from the Have dataset. Few more example as below.

 

data have;
input transaction & $100.;
cards;
INTERNET BANKING FUNDS TFER TRANSFER REF:NNN FROM A/C:NNNNNN
EFTPOS COSTCO WHOLESALE COUNTRY SUBURB COUNTRY
;


data remove_list;
input rule_id & $100.;
cards;
INTERNET BANKING FUNDS TFER
EFTPOS
COUNTRY SUBURB COUNTRY
;

 

data want;
input transaction & $100.;
cards;
TRANSFER REF:NNN FROM A/C:NNNNNN
COSTCO WHOLESALE
;

 

PaigeMiller
Diamond | Level 26

In data set REMOVE_LIST, do we remove individual words (separated by spaces) whenever such a word is found in data set HAVE? Or does REMOVE_LIST contain phrases that are removed from data set HAVE when the entire phrase is found in HAVE?

 

Instead of us going through this process of asking for clarification multiple times, can you write a clear and complete problem description?

--
Paige Miller
aj34321
Quartz | Level 8

Hi @PaigeMiller

 

Sorry for that... Basically My task is to clean the transaction lines. My ultimate goal is to keep only the merchant names in the transaction string and hence i want to remove all these unwanted phrases. So basically, which ever transaction line finds this entire phrase as defined in my remove list should be removed from the transaction variable in the HAVE dataset. I dont want to break them by each single word as couple of other phrases might have post code and in removing number separatelly there are chances that some other wanted number might have gone. So which I decided to create this removal parameter list which should be looked for as a whole phrase and if found then remove it.

 

VISA DEBIT PURCHASE CARD 8013
VISA DEBIT PURCHASE CARD 4059
VISA DEBIT PURCHASE CARD 7005
VISA DEBIT PURCHASE CARD 6008
VISA DEBIT PURCHASE CARD 4095

INTERNET BANKING FUNDS TFER
EFTPOS
COUNTRY SUBURB COUNTRY

 

 

And so is the below result after removing all the phrases from the HAVE dataset.

 

MERCHANT_A AREA1
MERCHANT_B
MERCHANT_X AREA10 2271
MERCHANT_Y 
MERCHANT_Y AREA100 57898
MERCHANT_X
MERCHANT_Z AREA1 AREA1 
MERCHANT_Z MERCHANT_Z

TRANSFER REF:NNN FROM A/C:NNNNNN
COSTCO WHOLESALE

 

Hope I was able to explain you in a much better way..... 

 

Thanks mate..

 

Rgds,

AK

 

Kurt_Bremser
Super User

Load the phrases into a temporary array, and loop over that:

data have;
input transaction & $100.;
cards;
VISA DEBIT PURCHASE CARD 8013 MERCHANT_A AREA1
VISA DEBIT PURCHASE CARD 8013 MERCHANT_B
VISA DEBIT PURCHASE CARD 4059 MERCHANT_X AREA10 2271
VISA DEBIT PURCHASE CARD 7005 MERCHANT_Y 
VISA DEBIT PURCHASE CARD 6008 MERCHANT_Y AREA100 57898
VISA DEBIT PURCHASE CARD 4095 MERCHANT_X
VISA DEBIT PURCHASE CARD 4095 MERCHANT_Z AREA1 AREA1 
VISA DEBIT PURCHASE CARD 4095 MERCHANT_Z MERCHANT_Z
INTERNET BANKING FUNDS TFER TRANSFER REF:NNN FROM A/C:NNNNNN
EFTPOS COSTCO WHOLESALE COUNTRY SUBURB COUNTRY
;

data remove_list;
input rule_id & $100.;
cards;
VISA DEBIT PURCHASE CARD 8013 
VISA DEBIT PURCHASE CARD 4059 
VISA DEBIT PURCHASE CARD 7005 
VISA DEBIT PURCHASE CARD 6008 
VISA DEBIT PURCHASE CARD 4095
INTERNET BANKING FUNDS TFER
EFTPOS
COUNTRY SUBURB COUNTRY
;

proc sql noprint;
select nobs into :nobs
from dictionary.tables
where libname = "WORK" and memname = "REMOVE_LIST";
quit;

data want;
set have;
array lookup {&nobs.} $100 _temporary_;
if _n_ = 1
then do i = 1 to &nobs.;
  set remove_list;
  lookup{i} = rule_id;
end;
do i = 1 to &nobs.;
  pos = index(transaction,strip(lookup{i}));
  if pos
  then do;
    if pos > 1
    then transaction =
      substr(transaction,1,pos-1) !!
      substr(transaction,pos + length(lookup{i}))
    ;
    else transaction = substr(transaction,length(lookup{i})+1);
  end;
end;
run;
PaigeMiller
Diamond | Level 26

Using the example in your earlier reply

 

data have;
input transaction & $100.;
cards;
INTERNET BANKING FUNDS TFER TRANSFER REF:NNN FROM A/C:NNNNNN
EFTPOS COSTCO WHOLESALE COUNTRY SUBURB COUNTRY
;


data remove_list;
input rule_id & $100.;
cards;
INTERNET BANKING FUNDS TFER
EFTPOS
COUNTRY SUBURB COUNTRY
;

proc sql;
    select distinct strip(rule_id) into :removes separated by ',' from remove_list;
quit;
%put &=removes;

data want;
    set have;
    do i=1 to countw("&removes",',');
        this_remove=scan("&removes",i,',');
        where=find(transaction,this_remove,'it');
        if where>0 then substr(transaction,where,length(this_remove))='';
        transaction=strip(transaction);
        if i=countw("&removes",',') then output;
    end;
    drop i this_remove where;
run;
--
Paige Miller
aj34321
Quartz | Level 8
Thanks @PaigeMiller.. This one worked as charm... Thanks a lot. You are a GEM !!
Ksharp
Super User
data have;
input transaction & $100.;
cards;
INTERNET BANKING FUNDS TFER TRANSFER REF:NNN FROM A/C:NNNNNN
EFTPOS COSTCO WHOLESALE COUNTRY SUBURB COUNTRY
;


data remove_list;
input rule_id & $100.;
cards;
INTERNET BANKING FUNDS TFER
EFTPOS
COUNTRY SUBURB COUNTRY
;

options noquotelenmax;
proc sql noprint;
select rule_id into :t separated by '|'
 from remove_list;
quit;

%put &t ;

data want1;
 set have;
want=prxchange("s/\b(&t)\b//oi",-1, transaction);
run;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1699 views
  • 2 likes
  • 4 in conversation