Solved: need help with exclusion

Cbob03 · Posted 05-27-2020 07:28 PM

Hello,

I have a dataset called "coupon_data" which has 5 coupon code columns for each prod ID. There are some invalid coupon code in this dataset that I need to remove. I also have a list of invalid coupon codes and I am calling it as "invalid_codes". Basically I want compare my "coupon_data" against "invalid_codes" and if match found I want to remove that code from the coupon column.

I am fairly new to hashing/array concept and i am sure you can judge that from my below experimental code. I know i am doing something wrong but i couldn't figure out and I am not getting the desired output.

Could someone please help? Thanks.

Data coupon_data;
input @1 prod_ID :$6. @8 coupon1 $5. @14 coupon2 $5.  @20 coupon3 $5. @26 coupon4 $5. @32 coupon5 $5.; 
infile cards missover ;
cards;
132522 KCB33 CKM90 LKJ89 LKU09 NM789
258798 OUI00 UIT78 POU89 CC909 
548998 NM789       PI980 VBV46 JKO09
109743 YUO90 JJI99 POU89       MOP09
;
run;

Data invalid_codes;
input inv_coupon $5.;
infile cards;
cards;
LKJ89
CC909
MOP09
YUO90
;
Quit;


data valid_codes;
  set coupon_data;
  array cpn [5] coupon1-coupon5;
  array outcpn [5] $5;
  if _N_=1 then do;
    dcl hash H(dataset:'invalid_codes');
    H.definekey('inv_coupon');
    H.definedone(); 
    if 0 then set invalid_codes;
  end;
  do i=1 to 5;
    coupon=cpn[i];   
    if H.find()=0 then do;     
      outcpn[i]=cpn[i];
     end;
  end;
  drop  coupon coupon1-coupon5 i inv_coupon;
rename outcpn1-outcpn5 = coupon1-coupon5;	
run;

Desired output:

prod_ID	coupon1	coupon2	coupon3	coupon4	coupon5
132522	KCB33	CKM90		LKU09	NM789
258798	OUI00	UIT78	POU89
548998	NM789		PI980	VBV46	JKO09
109743	YUO90	JJI99	POU89

novinosrin · Posted 05-27-2020 08:00 PM

Hi @Cbob03 Good try albeit requires minor and lazy correction imho


Data coupon_data;
input @1 prod_ID :$6. @8 coupon1 $5. @14 coupon2 $5.  @20 coupon3 $5. @26 coupon4 $5. @32 coupon5 $5.; 
infile cards missover ;
cards;
132522 KCB33 CKM90 LKJ89 LKU09 NM789
258798 OUI00 UIT78 POU89 CC909 
548998 NM789       PI980 VBV46 JKO09
109743 YUO90 JJI99 POU89       MOP09
;
run;

Data invalid_codes;
input inv_coupon $5.;
infile cards;
cards;
LKJ89
CC909
MOP09
YUO90
;

data valid_codes;
  set coupon_data;
  array cpn  coupon1-coupon5;
  if _N_=1 then do;
    dcl hash H(dataset:'invalid_codes');
    H.definekey('inv_coupon');
    H.definedone(); 
    if 0 then set invalid_codes;
  end;
  do over cpn;
    if H.check(key:cpn)=0 then call missing(cpn);
  end;
  drop inv_coupon;
run;

View solution in original post

novinosrin · Posted 05-27-2020 08:00 PM

Hi @Cbob03 Good try albeit requires minor and lazy correction imho


Data coupon_data;
input @1 prod_ID :$6. @8 coupon1 $5. @14 coupon2 $5.  @20 coupon3 $5. @26 coupon4 $5. @32 coupon5 $5.; 
infile cards missover ;
cards;
132522 KCB33 CKM90 LKJ89 LKU09 NM789
258798 OUI00 UIT78 POU89 CC909 
548998 NM789       PI980 VBV46 JKO09
109743 YUO90 JJI99 POU89       MOP09
;
run;

Data invalid_codes;
input inv_coupon $5.;
infile cards;
cards;
LKJ89
CC909
MOP09
YUO90
;

data valid_codes;
  set coupon_data;
  array cpn  coupon1-coupon5;
  if _N_=1 then do;
    dcl hash H(dataset:'invalid_codes');
    H.definekey('inv_coupon');
    H.definedone(); 
    if 0 then set invalid_codes;
  end;
  do over cpn;
    if H.check(key:cpn)=0 then call missing(cpn);
  end;
  drop inv_coupon;
run;

Cbob03 · Posted 05-30-2020 02:20 AM

Thank you very much!

SASKiwi · Posted 05-27-2020 08:04 PM

Lookup formats work well too:

proc format;
 value $Invalid_Codes
'LKJ89' = 'I'
'CC909' = 'I'
'MOP09' = 'I'
'YUO90' = 'I'
' '     = 'I'
other   = 'V'
;
run;

data valid_codes;
  set coupon_data;
  array cpn [5] coupon1-coupon5;
  array outcpn [5] $5;
  do i=1 to 5;   
    if put(cpn[i], $Invalid_Codes.) = 'V' then outcpn[i]=cpn[i];
  end;
  drop   coupon1-coupon5 i ;
  rename outcpn1-outcpn5 = coupon1-coupon5;	
run;

RichardDeVen · Posted 05-27-2020 08:12 PM

An existential lookup should be performed with the CHECK method.

Example:

data valid_codes;
  set coupon_data;
  array cpn [5] coupon1-coupon5;

  if _N_=1 then do;
    dcl hash H(dataset:'invalid_codes');
    H.definekey('inv_coupon');
    H.definedone(); 
    if 0 then set invalid_codes;
  end;

  do i=1 to 5;
    if H.check(key:cpn[i]) eq 0 then cpn[i] = '';  * coupon code found in hash of invalids, clear it;
  end;

  drop i inv_coupon;
run;

View result

ballardw · Posted 05-28-2020 01:02 AM

As an alternative a custom Informat approach. You didn't say how many codes you may have so I show an approach to turn the invalid codes data set into a CNTLIN data set for use by Proc Format. There are some pretty specific rules about the names of the variables used by Proc Format and finding the approriate values for the TYPE (invalue or value, character or numeric) and the HLO column (used to set upper/lower inclusion boundaries or the "other" property) aren't always the easiest. Sometimes just create a cntlout data set with a small number of values.

Then use the new informat to read the data set the value to blank for the invalid codes.

Data invalid_codes;
input inv_coupon $5.;
infile cards;
cards;
LKJ89
CC909
MOP09
YUO90
;
data cntlin;
   set invalid_codes end=eof;
   fmtname='COUPON';
   default=5;
   length=5;
   type='J';
   start=inv_coupon;
   /* label needs to be long enough to hold _same_ below*/
   length label $ 6; 
   label=' ';
   output;
   if eof then do;
   /* what to do for "valid" codes*/
      call missing(start,label);
      HLO='O';
      label='_SAME_';
      output;
   end;
run;

proc format library=work  cntlin=cntlin ;
run;

Data coupon_data;
input @1 prod_ID :$6. @8 coupon1 $COUPON. @14 coupon2 $COUPON.  @20 coupon3 $COUPON.
     @26 coupon4 $COUPON. @32 coupon5 $COUPON.; 
infile cards missover ;
cards;
132522 KCB33 CKM90 LKJ89 LKU09 NM789
258798 OUI00 UIT78 POU89 CC909 
548998 NM789       PI980 VBV46 JKO09
109743 YUO90 JJI99 POU89       MOP09
;
run;

The "other" statement with the HLO at the end likely isn't needed but I include it to show how to set such where it may be needed. A similar approach could be used with a list of VALID codes and an _error_ condition, set a bit differently, that would tell you when invalid values were encountered. So you could have 1) valid codes pass through with the _same_, known invalid codes get set to missing and an _error_ for values not on either. That way you would know whether to add it the newly encountered value to one of the sets (update the format and reread the data with the new format).

If the actual list of invalid codes is as small as you show it may be easier just to use Proc format code.

proc format library=work cntlout=work.cntlout1;
invalue $coupon (default=5)
"LKJ89","CC909","MOP09","YUO90" = ' '
other = _same_
;
run;

I have several projects where there is just enough flux in things like "site identifier codes" that I have Informats that do the _same_, set to missing and have an _error_ setting to tell me when some one sneaks a new code in so I can get all of the related information such as location, long name and other properties I use.

The modified code to have "valid" , reset and _error_ could look like

proc format library=work cntlout=work.cntlout1;
invalue $coupon (default=5)
"LKJ89","CC909","MOP09","YUO90" = ' '
"CKM90","JJI99","JKO09","KCB33","LKU09","NM789",
"OUI00","PI980","POU89","UIT78","VBV46" = _same_

other = _error_;
;
run;

The _same_ are the valid codes.

For an example, use this version of the informat after adding a new code that does not appear in the invalid codes or your previous data and rerun the data step reading it after updating the informat (or using a new informat name)

IF however your invalid code list was per Prod_id , meaning the code might be valid for one product but not another, then the HASH would be the way to go as the Informat approach only uses single values. Not saying it couldn't be done but the complexity would not be worth the effort.

Cbob03 · Posted 05-30-2020 02:24 AM

Thank you for your response. I have lot many invalid codes to compare against so hash approach worked for my problem. Thanks again for your time!

need help with exclusion

Re: need help with exclusion

Re: need help with exclusion

Re: need help with exclusion

Re: need help with exclusion

Re: need help with exclusion

Re: need help with exclusion

Re: need help with exclusion

SAS Innovate 2025: Register Now

SAS Training: Just a Click Away