Re: SAS Index Merge

sfffdg · Posted 04-21-2021 02:07 AM

Hi ,

The data set UPDATE_EMAIL_ltst has c_id, email_id variables

The data set dcclnt has c_id , email_id and p_c_id variables

c_id, email_id are the key variables

for the below query ,I am getting previous row data for p_c_id instead the current record data.

Can any one help with this.

rsubmit;
data Email ;
set UPDATE_EMAIL_ltst ;
set dcclnt key=Clt_Email /unique;
if _error_=0 then
do;
delete;
end;
else do;
_error_=0;
output ;
end;
run;
endrsubmit;

mkeintz · Posted 04-21-2021 02:54 AM

Sample data, in the form of working data steps, please. Help us help you.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sfffdg · Posted 04-21-2021 05:20 AM

Hi,

The data sample is below,the inputs and the output

In Email Out put data set P_Email_id should be ideally xyz1@gmail.com ,where as it is abc@gmail.com

sfffdg · Posted 04-21-2021 05:53 AM

The code is below

rsubmit;
data a;
input C_ID $ EMAIL_ID $20.;
cards;
C1 abc@gmail.com
C2 xyz@gmail.com

;
run;
endrsubmit;

rsubmit;
data b(index=(Clt_Email=(C_ID email_id)));
input C_ID $ EMAIL_ID $5-18 P_EMAIL_ID $20-33
;
cards;
C1 abc@gmail.com abc@gmail.com
C11 abc1@gmail.com abc1@gmail.com
C2 xyz1@gmail.com xyz1@gmail.com

;
run;
endrsubmit;

rsubmit;
data ab ;
set a ;
set b key=Clt_Email /unique;
if _error_=0 then
do;
delete;
end;
else do;
_error_=0;
output ;
end;
run;
endrsubmit;

Kurt_Bremser · Posted 04-21-2021 06:15 AM

This creates your expected dataset without indexes:

data a;
input C_ID $ EMAIL_ID :$20.;
cards;
C1 abc@gmail.com
C2 xyz@gmail.com
;

data b;
input C_ID $ EMAIL_ID :$20. P_EMAIL_ID :$20.;
cards;
C1 abc@gmail.com abc@gmail.com
C11 abc1@gmail.com abc1@gmail.com
C2 xyz1@gmail.com xyz1@gmail.com
;

data ab;
set a;
if _n_ = 1
then do;
  length _email_id P_EMAIL_ID $20;
  declare hash b (dataset:"b (rename=(email_id=_email_id))");
  b.definekey("c_id");
  b.definedata("_email_id","p_email_id");
  b.definedone();
  call missing(_email_id,P_EMAIL_ID);
end;
if b.find() = 0 and _email_id = email_id then delete;
drop _:;
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

sfffdg · Posted 04-21-2021 06:03 PM

Hi Kurt ,

Thanks for the solution ,the problem with this approach is memory failure .

How can we change the below index merge query so that it will serve the purpose.

rsubmit;
data ab ;
set a ;
set b key=Clt_Email /unique;
if _error_=0 then
do;
delete;
end;
else do;
_error_=0;
output ;
end;
run;
endrsubmit;

sfffdg · Posted 04-21-2021 06:08 PM

Hi Kurt,

Thanks for the solution given ,but when ran it gives the memory failure error.

How can we change the below index merge query so that it will serve the purpose,which is fast as well.

rsubmit;
data ab  ;
set a ;
set b key=Clt_Email /unique;
if _error_=0 then
	             do;
                    delete;
	             end;
            else do;
			        _error_=0; 
                    output ;
		         end; 
run;
endrsubmit;

mkeintz · Posted 04-21-2021 03:09 AM

OK, I figured this out without provision of sample data.

All variables retrieved via a SET statement are retained (i.e. held over to the next data step iteration (usually the next observaton).
But a successful SET overwrites the retained values from the prior obs with new values.
An unsuccessful SET does NOT overwrite the retained values.

So your unsuccessful SET /key= yields a non-zero _ERROR_ . In such cases, you have chosen to reset _ERROR_ to zero, but you also need to set the retained P_C_ID to missing.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

ChrisNZ · Posted 04-21-2021 03:34 AM

Format your code, this is illegible. There's an icon for this to use when pasting.

rsubmit;                                           %* send to remote host;
  data EMAIL ;                                     %* create table EMAIL;
    set UPDATE_EMAIL_LIST ;                        %* read table UPDATE_EMAIL_LIST ;
    set DCCLNT key=CLTEMAIL /unique;               %* read table DCCLNT using key CLTEMAIL ;
    if _ERROR_=0 then do;                          %* if a record is found in DCCLNT ;
      delete;                                      %* delete (i.e. ignore) the record read in UPDATE_EMAIL_LIST ;
    end;
    else do;                                       %* else ;
      _ERROR_=0;                                   %* reset _ERROR_ flag ; 
      output ;                                     %* save the record read in UPDATE_EMAIL_LIST ;
    end;
  run;
endrsubmit;

So in effect: keep all records from UPDATE_EMAIL_LIST not present in DCCLNT

High-Performance SAS Coding - Third Edition

Kurt_Bremser · Posted 04-21-2021 03:36 AM

If I read that code correctly, you want to delete observations where the key is found in dcclnt?

In which case, a hash solves the issue nicely without needing to rely on the indexes:

data Email ;
set UPDATE_EMAIL_ltst;
if _n_ =1
then do;
  declare hash dcc (dataset:"dcclnt");
  dcc.definekey("Clt_Email");
  dcc.definedone();
end;
if dcc.find() ne 0;
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

ChrisNZ · Posted 04-21-2021 03:51 AM

@Kurt_Bremser check() would be faster than find() here 🙂 .

High-Performance SAS Coding - Third Edition

sfffdg · Posted 04-21-2021 06:18 AM

Thanks Kurt ,

But I get the below error.

ERROR: Undeclared key symbol Clt_Email for hash object at line 1019 column 3.
ERROR: DATA STEP Component Object failure. Aborted during the EXECUTION phase.

Kurt_Bremser · Posted 04-21-2021 06:37 AM

My code as posted works. Just copy/paste and run it.

Adapt it to your variables, or post example data with the real column names.

Note that the hash keys only on the id and uses the email in a separate comparison.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

sfffdg · Posted 04-21-2021 07:05 AM

Thanks Kurt,

I just used the same code as below.

The whole code is as below

rsubmit;
data UPDATE_EMAIL_ltst(keep=client_id email_id );
set UPDATE_EMAIL;
email_id=upcase(event_value);
by client_id last_update_ts;
if last.client_id;
run;
endrsubmit;

rsubmit;
data dcclnt(keep=client_id polisy_email_id email_id index=(Clt_Email=(client_id email_id)));
set data.dcclnt(rename=(clt_key=client_id) where=(valid_flag='1') );
email_id=upcase(email_id);
polisy_email_id=email_id;
run;
endrsubmit;

rsubmit;
data Email_t;
set UPDATE_EMAIL_ltst;
if _n_ =1
then do;
declare hash dcc (dataset:"dcclnt");
dcc.definekey("client_id");
dcc.definedone();
end;
if dcc.find() ne 0;
run;
endrsubmit;

The error is below:

ERROR: Hash object added 32505840 items when memory failure occurred.
FATAL: Insufficient memory to execute DATA step program. Aborted during the EXECUTION phase.
ERROR: The SAS System stopped processing this step because of insufficient memory.

ChrisNZ · Posted 04-21-2021 07:17 AM

Try using something, like _N_, as hash data. Otherwise the key is used again as data and it's probably quite long.

High-Performance SAS Coding - Third Edition

SAS Innovate 2025: Register Today!

SAS Training: Just a Click Away