BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
nkm123
Calcite | Level 5

Hi,

 

I have dataset in following format 

 

 

data have;
store_id = 101; source_id = 1;dest_id = 3;output;
store_id = 101; source_id = 2;dest_id = 3;output;
store_id = 102; source_id = 1;dest_id = 4;output;
store_id = 102; source_id = 2;dest_id = 4;output;
run;

 

 

and my requirement is to take out all duplicates where source or dest id is already taken. so final dataset should be like this 

 

 

data want;
store_id = 101; source_id = 1;dest_id = 3;output;
store_id = 102; source_id = 2;dest_id = 4;output;
run;

 

 

Logic:- 

 

1) First record doesn't have any previous record to compare so will be there in final dataset.

2) second record should be removed as dest_id = 3 is taken by first observation.

3) third record should also be removed as source id = 1 is already taken by first observation.

4) should keep fourth observation as source and dest id are not taken previously.

 

I have millions of observation and different combination (source and dest id is not 1 and 2 but can have as many as digit ).

 

Thanks in advance for help.

 

Thanks,

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Haikuo
Onyx | Level 15

Hash comes in handy:

data want;
if _n_=1 then do;

dcl hash h();
h.definekey('id');
h.definedata('id');
h.definedone();
call missing (id);
end;

set have;
if h.check(key:source_id) and h.check(key:dest_id) then output;
rc=h.ref(key:source_id,data:source_id);
rc=h.ref(key:dest_id,data:dest_id);
drop id rc;
run;
 

View solution in original post

8 REPLIES 8
nkm123
Calcite | Level 5

More Examples:-

 

data have;
store_id = 101; source_id = 1;dest_id = 3;output;
store_id = 101; source_id = 2;dest_id = 3;output;
store_id = 102; source_id = 1;dest_id = 4;output;
store_id = 102; source_id = 2;dest_id = 4;output;
store_id = 103; source_id = 5;dest_id = 6;output;
store_id = 103; source_id = 5;dest_id = 7;output;
store_id = 103; source_id = 8;dest_id = 10;output;
store_id = 103; source_id = 9;dest_id = 10;output;

run;

data want;
store_id = 101; source_id = 1;dest_id = 3;output;
store_id = 102; source_id = 2;dest_id = 4;output;
store_id = 103; source_id = 5;dest_id = 6;output;
store_id = 103; source_id = 8;dest_id = 10;output;
run;
Haikuo
Onyx | Level 15

Hash comes in handy:

data want;
if _n_=1 then do;

dcl hash h();
h.definekey('id');
h.definedata('id');
h.definedone();
call missing (id);
end;

set have;
if h.check(key:source_id) and h.check(key:dest_id) then output;
rc=h.ref(key:source_id,data:source_id);
rc=h.ref(key:dest_id,data:dest_id);
drop id rc;
run;
 
Reeza
Super User

@Haikuo I think the forum garbled your code a bit?  

nkm123
Calcite | Level 5

Thanks Haikuo. I just modified little bit and it's working. Thanks for solution. will implement to real data and let you know reult. 

data want;
if _n_=1 then do;

dcl hash h();
h.definekey('id');
h.definedata('id');
h.definedone();
call missing (id);
end;

set have;
if h.check(key:source_id) and h.check(key:dest_id) then do ;output;
rc=h.ref(key:source_id,data:source_id);
rc=h.ref(key:dest_id,data:dest_id);
drop id rc;
end;
run;
Haikuo
Onyx | Level 15

Don't think your version is working. Noticing the 'And' in the following code, so if you have a 'Or', you will be skipping those 'id's that need to be uploaded into Hash.

if h.check(key:source_id) and h.check(key:dest_id) 

 

Edit:

Maybe I have mistaken your intention, so do please do ignore my comments above if it is the case. 

Haikuo
Onyx | Level 15

Yes, @Reeza. It has been doing this to all of the hash code from day one. I have no idea when and if it can be fixed :(. 

nkm123
Calcite | Level 5

Yes, I need to skip  id's to load into hash if match is not found so added do loop condition to your code so hash object only updated when record is written to output. It's what I wanted. 

 

Even I implemented it to real data and it's working fine. Didn't get any unexpected error or result.

 

Thanks Haikuo for your support. 

 

Thanks,

vkmanchi
Calcite | Level 5

This one seems to be working too!

 

data have_not (drop=one_source_id one_dest_id);

retain one_source_id one_dest_id;

set have;

 

if _N_=1 then

do;

one_source_id=source_id;

one_dest_id=dest_id;

end;

else

do;

if source_id = one_source_id then delete;

if dest_id=one_dest_id then delete;

if source_id ne lag(source_id) then one_source_id=source_id;

if dest_id ne lag(dest_id) then one_dest_id=dest_id;

end;

run;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1514 views
  • 0 likes
  • 4 in conversation