BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Cbob03
Fluorite | Level 6

Hello,

 

I have a dataset called "coupon_data" which has 5 coupon code columns for each prod ID.  There are some invalid coupon code in this dataset that I need to remove. I also have a list of invalid coupon codes and I am calling it as "invalid_codes". Basically I want compare my "coupon_data" against  "invalid_codes" and if match found I want to remove that code from the coupon column. 

 

I am fairly new to hashing/array concept and i am sure you can judge that from my below experimental code. I know i am doing something wrong but i couldn't figure out  and I am not getting the desired output.

 

Could someone please help? Thanks.

 

 

Data coupon_data;
input @1 prod_ID :$6. @8 coupon1 $5. @14 coupon2 $5.  @20 coupon3 $5. @26 coupon4 $5. @32 coupon5 $5.; 
infile cards missover ;
cards;
132522 KCB33 CKM90 LKJ89 LKU09 NM789
258798 OUI00 UIT78 POU89 CC909 
548998 NM789       PI980 VBV46 JKO09
109743 YUO90 JJI99 POU89       MOP09
;
run;

Data invalid_codes;
input inv_coupon $5.;
infile cards;
cards;
LKJ89
CC909
MOP09
YUO90
;
Quit;


data valid_codes;
  set coupon_data;
  array cpn [5] coupon1-coupon5;
  array outcpn [5] $5;
  if _N_=1 then do;
    dcl hash H(dataset:'invalid_codes');
    H.definekey('inv_coupon');
    H.definedone(); 
    if 0 then set invalid_codes;
  end;
  do i=1 to 5;
    coupon=cpn[i];   
    if H.find()=0 then do;     
      outcpn[i]=cpn[i];
     end;
  end;
  drop  coupon coupon1-coupon5 i inv_coupon;
rename outcpn1-outcpn5 = coupon1-coupon5;	
run;


Desired output:

 

prod_IDcoupon1coupon2coupon3coupon4coupon5
132522KCB33CKM90 LKU09NM789
258798OUI00UIT78POU89  
548998NM789 PI980VBV46JKO09
109743YUO90JJI99POU89  
1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20

Hi @Cbob03   Good try albeit requires minor and lazy correction imho


Data coupon_data;
input @1 prod_ID :$6. @8 coupon1 $5. @14 coupon2 $5.  @20 coupon3 $5. @26 coupon4 $5. @32 coupon5 $5.; 
infile cards missover ;
cards;
132522 KCB33 CKM90 LKJ89 LKU09 NM789
258798 OUI00 UIT78 POU89 CC909 
548998 NM789       PI980 VBV46 JKO09
109743 YUO90 JJI99 POU89       MOP09
;
run;

Data invalid_codes;
input inv_coupon $5.;
infile cards;
cards;
LKJ89
CC909
MOP09
YUO90
;

data valid_codes;
  set coupon_data;
  array cpn  coupon1-coupon5;
  if _N_=1 then do;
    dcl hash H(dataset:'invalid_codes');
    H.definekey('inv_coupon');
    H.definedone(); 
    if 0 then set invalid_codes;
  end;
  do over cpn;
    if H.check(key:cpn)=0 then call missing(cpn);
  end;
  drop inv_coupon;
run;

View solution in original post

6 REPLIES 6
novinosrin
Tourmaline | Level 20

Hi @Cbob03   Good try albeit requires minor and lazy correction imho


Data coupon_data;
input @1 prod_ID :$6. @8 coupon1 $5. @14 coupon2 $5.  @20 coupon3 $5. @26 coupon4 $5. @32 coupon5 $5.; 
infile cards missover ;
cards;
132522 KCB33 CKM90 LKJ89 LKU09 NM789
258798 OUI00 UIT78 POU89 CC909 
548998 NM789       PI980 VBV46 JKO09
109743 YUO90 JJI99 POU89       MOP09
;
run;

Data invalid_codes;
input inv_coupon $5.;
infile cards;
cards;
LKJ89
CC909
MOP09
YUO90
;

data valid_codes;
  set coupon_data;
  array cpn  coupon1-coupon5;
  if _N_=1 then do;
    dcl hash H(dataset:'invalid_codes');
    H.definekey('inv_coupon');
    H.definedone(); 
    if 0 then set invalid_codes;
  end;
  do over cpn;
    if H.check(key:cpn)=0 then call missing(cpn);
  end;
  drop inv_coupon;
run;
Cbob03
Fluorite | Level 6
Thank you very much!
SASKiwi
PROC Star

Lookup formats work well too:

 

proc format;
 value $Invalid_Codes
'LKJ89' = 'I'
'CC909' = 'I'
'MOP09' = 'I'
'YUO90' = 'I'
' '     = 'I'
other   = 'V'
;
run;

data valid_codes;
  set coupon_data;
  array cpn [5] coupon1-coupon5;
  array outcpn [5] $5;
  do i=1 to 5;   
    if put(cpn[i], $Invalid_Codes.) = 'V' then outcpn[i]=cpn[i];
  end;
  drop   coupon1-coupon5 i ;
  rename outcpn1-outcpn5 = coupon1-coupon5;	
run;
RichardDeVen
Barite | Level 11

An existential lookup should be performed with the CHECK method.

 

Example:

data valid_codes;
  set coupon_data;
  array cpn [5] coupon1-coupon5;

  if _N_=1 then do;
    dcl hash H(dataset:'invalid_codes');
    H.definekey('inv_coupon');
    H.definedone(); 
    if 0 then set invalid_codes;
  end;

  do i=1 to 5;
    if H.check(key:cpn[i]) eq 0 then cpn[i] = '';  * coupon code found in hash of invalids, clear it;
  end;

  drop i inv_coupon;
run;

View result

RichardADeVenezia_0-1590624700952.png

 

ballardw
Super User

As an alternative a custom Informat approach. You didn't say how many codes you may have so I show an approach to turn the invalid codes data set into a CNTLIN data set for use by Proc Format. There are some pretty specific rules about the names of the variables used by Proc Format and finding the approriate values for the TYPE (invalue or value, character or numeric) and the HLO column (used to set upper/lower inclusion boundaries or the "other" property) aren't always the easiest. Sometimes just create a cntlout data set with a small number of values.

Then use the new informat to read the data set the value to blank for the invalid codes.

Data invalid_codes;
input inv_coupon $5.;
infile cards;
cards;
LKJ89
CC909
MOP09
YUO90
;
data cntlin;
   set invalid_codes end=eof;
   fmtname='COUPON';
   default=5;
   length=5;
   type='J';
   start=inv_coupon;
   /* label needs to be long enough to hold _same_ below*/
   length label $ 6; 
   label=' ';
   output;
   if eof then do;
   /* what to do for "valid" codes*/
      call missing(start,label);
      HLO='O';
      label='_SAME_';
      output;
   end;
run;

proc format library=work  cntlin=cntlin ;
run;

Data coupon_data;
input @1 prod_ID :$6. @8 coupon1 $COUPON. @14 coupon2 $COUPON.  @20 coupon3 $COUPON.
     @26 coupon4 $COUPON. @32 coupon5 $COUPON.; 
infile cards missover ;
cards;
132522 KCB33 CKM90 LKJ89 LKU09 NM789
258798 OUI00 UIT78 POU89 CC909 
548998 NM789       PI980 VBV46 JKO09
109743 YUO90 JJI99 POU89       MOP09
;
run;

The "other" statement with the HLO at the end likely isn't needed but I include it to show how to set such where it may be needed. A similar approach could be used with a list of VALID codes and an _error_ condition, set a bit differently, that would tell you when invalid values were encountered. So you could have 1) valid codes pass through with the _same_, known invalid codes get set to missing and an _error_ for values not on either. That way you would know whether to add it the newly encountered value to one of the sets (update the format  and reread the data with the new format).

 

If the actual list of invalid codes is as small as you show it may be easier just to use Proc format code.

proc format library=work cntlout=work.cntlout1;
invalue $coupon (default=5)
"LKJ89","CC909","MOP09","YUO90" = ' '
other = _same_
;
run;

I have several projects where there is just enough flux in things like "site identifier codes" that I have Informats that do the _same_, set to missing and have an _error_ setting to tell me when some one sneaks a new code in so I can get all of the related information such as location, long name and other properties I use.

The modified code to have "valid" , reset and _error_ could look like

proc format library=work cntlout=work.cntlout1;
invalue $coupon (default=5)
"LKJ89","CC909","MOP09","YUO90" = ' '
"CKM90","JJI99","JKO09","KCB33","LKU09","NM789",
"OUI00","PI980","POU89","UIT78","VBV46" = _same_

other = _error_;
;
run;

The _same_ are the valid codes.

For an example, use this version of the informat after adding a new code that does not appear in the invalid codes or your previous data and rerun the data step reading it after updating the informat (or using a new informat name)

 

 

IF however your invalid code list was per Prod_id , meaning the code might be valid for one product but not another, then the HASH would be the way to go as the Informat approach only uses single values. Not saying it couldn't be done but the complexity would not be worth the effort.

Cbob03
Fluorite | Level 6
Thank you for your response. I have lot many invalid codes to compare against so hash approach worked for my problem. Thanks again for your time!

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 988 views
  • 2 likes
  • 5 in conversation