BookmarkSubscribeRSS Feed
sfffdg
Obsidian | Level 7

Hi ,

 

The  data set UPDATE_EMAIL_ltst  has c_id, email_id variables

The data set dcclnt  has c_id , email_id  and  p_c_id variables

c_id, email_id are the key variables

for the below query ,I am getting previous row  data for p_c_id  instead the current record data.

Can any one help with this.

 


rsubmit;
data Email ;
set UPDATE_EMAIL_ltst ;
set dcclnt key=Clt_Email /unique;
if _error_=0 then
do;
delete;
end;
else do;
_error_=0;
output ;
end;
run;
endrsubmit;

21 REPLIES 21
mkeintz
PROC Star

Sample data, in the form of working data steps, please.   Help us help you.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
sfffdg
Obsidian | Level 7

Hi,

 

The data sample is below,the inputs and the output

 

swathiprasad_0-1618996725191.png

 

 

In Email Out put data set P_Email_id  should be  ideally xyz1@gmail.com ,where as it is abc@gmail.com

sfffdg
Obsidian | Level 7

The code is below

rsubmit;
data a;
input C_ID $ EMAIL_ID $20.;
cards;
C1 abc@gmail.com
C2 xyz@gmail.com

;
run;
endrsubmit;


rsubmit;
data b(index=(Clt_Email=(C_ID email_id)));
input C_ID $ EMAIL_ID $5-18 P_EMAIL_ID $20-33
;
cards;
C1 abc@gmail.com abc@gmail.com
C11 abc1@gmail.com abc1@gmail.com
C2 xyz1@gmail.com xyz1@gmail.com

;
run;
endrsubmit;


rsubmit;
data ab ;
set a ;
set b key=Clt_Email /unique;
if _error_=0 then
do;
delete;
end;
else do;
_error_=0;
output ;
end;
run;
endrsubmit;

Kurt_Bremser
Super User

This creates your expected dataset without indexes:

data a;
input C_ID $ EMAIL_ID :$20.;
cards;
C1 abc@gmail.com
C2 xyz@gmail.com
;

data b;
input C_ID $ EMAIL_ID :$20. P_EMAIL_ID :$20.;
cards;
C1 abc@gmail.com abc@gmail.com
C11 abc1@gmail.com abc1@gmail.com
C2 xyz1@gmail.com xyz1@gmail.com
;

data ab;
set a;
if _n_ = 1
then do;
  length _email_id P_EMAIL_ID $20;
  declare hash b (dataset:"b (rename=(email_id=_email_id))");
  b.definekey("c_id");
  b.definedata("_email_id","p_email_id");
  b.definedone();
  call missing(_email_id,P_EMAIL_ID);
end;
if b.find() = 0 and _email_id = email_id then delete;
drop _:;
run;
sfffdg
Obsidian | Level 7

Hi Kurt ,

 

Thanks for the solution ,the problem with this approach is memory failure .

 

How can we change the below  index merge query so that it will serve the purpose.

rsubmit;
data ab ;
set a ;
set b key=Clt_Email /unique;
if _error_=0 then
do;
delete;
end;
else do;
_error_=0;
output ;
end;
run;
endrsubmit;

sfffdg
Obsidian | Level 7

Hi Kurt,

 

Thanks for the solution given ,but when ran it gives the memory failure error.

 

How can we change the below index merge query so that it will serve the purpose,which is fast as well.

 

 

rsubmit;
data ab  ;
set a ;
set b key=Clt_Email /unique;
if _error_=0 then
	             do;
                    delete;
	             end;
            else do;
			        _error_=0; 
                    output ;
		         end; 
run;
endrsubmit;

 

 

 

mkeintz
PROC Star

OK, I figured this out without provision of sample data.

 

  1. All variables retrieved via a SET statement are retained (i.e. held over to the next data step iteration (usually the next observaton).
  2. But a successful SET overwrites the retained values from the prior obs with new values.
  3. An unsuccessful SET does NOT overwrite the retained values.

 

So your unsuccessful SET /key=    yields a non-zero _ERROR_ .  In such cases, you have chosen to reset _ERROR_ to zero, but you also need to set the retained P_C_ID to missing.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
ChrisNZ
Tourmaline | Level 20

Format your code, this is illegible. There's an icon for this to use when pasting.

 

rsubmit;                                           %* send to remote host;
  data EMAIL ;                                     %* create table EMAIL;
    set UPDATE_EMAIL_LIST ;                        %* read table UPDATE_EMAIL_LIST ;
    set DCCLNT key=CLTEMAIL /unique;               %* read table DCCLNT using key CLTEMAIL ;
    if _ERROR_=0 then do;                          %* if a record is found in DCCLNT ;
      delete;                                      %* delete (i.e. ignore) the record read in UPDATE_EMAIL_LIST ;
    end;
    else do;                                       %* else ;
      _ERROR_=0;                                   %* reset _ERROR_ flag ; 
      output ;                                     %* save the record read in UPDATE_EMAIL_LIST ;
    end;
  run;
endrsubmit;

So in effect: keep all records from UPDATE_EMAIL_LIST  not present in DCCLNT

 

 

Kurt_Bremser
Super User

If I read that code correctly, you want to delete observations where the key is found in dcclnt?

 

In which case, a hash solves the issue nicely without needing to rely on the indexes:

data Email ;
set UPDATE_EMAIL_ltst;
if _n_ =1
then do;
  declare hash dcc (dataset:"dcclnt");
  dcc.definekey("Clt_Email");
  dcc.definedone();
end;
if dcc.find() ne 0;
run;
ChrisNZ
Tourmaline | Level 20

@Kurt_Bremser   check() would be faster than find() here 🙂 .

sfffdg
Obsidian | Level 7

Thanks Kurt ,

 

But I get the below error.

 

ERROR: Undeclared key symbol Clt_Email for hash object at line 1019 column 3.
ERROR: DATA STEP Component Object failure. Aborted during the EXECUTION phase.

Kurt_Bremser
Super User

My code as posted works. Just copy/paste and run it.

Adapt it to your variables, or post example data with the real column names.

Note that the hash keys only on the id and uses the email in a separate comparison.

sfffdg
Obsidian | Level 7

Thanks Kurt,

 

I just used the same code as below.

 

The whole code is as below

rsubmit;
data UPDATE_EMAIL_ltst(keep=client_id email_id );
set UPDATE_EMAIL;
email_id=upcase(event_value);
by client_id last_update_ts;
if last.client_id;
run;
endrsubmit;

 

rsubmit;
data dcclnt(keep=client_id polisy_email_id email_id index=(Clt_Email=(client_id email_id)));
set data.dcclnt(rename=(clt_key=client_id) where=(valid_flag='1') );
email_id=upcase(email_id);
polisy_email_id=email_id;
run;
endrsubmit;

 

rsubmit;
data Email_t;
set UPDATE_EMAIL_ltst;
if _n_ =1
then do;
declare hash dcc (dataset:"dcclnt");
dcc.definekey("client_id");
dcc.definedone();
end;
if dcc.find() ne 0;
run;
endrsubmit;

 

The error is below:

 

ERROR: Hash object added 32505840 items when memory failure occurred.
FATAL: Insufficient memory to execute DATA step program. Aborted during the EXECUTION phase.
ERROR: The SAS System stopped processing this step because of insufficient memory.

ChrisNZ
Tourmaline | Level 20
Try using something, like _N_, as hash data. Otherwise the key is used again as data and it's probably quite long.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 21 replies
  • 2941 views
  • 2 likes
  • 4 in conversation