DATA Step, Macro, Functions and more

How to select non duplicate values?

Reply
Frequent Contributor
Posts: 122

How to select non duplicate values?

[ Edited ]

Hi All,

 

I have a file with customerid, code1 and code2 varialbes. 

 

Input file:

Customerid Code1 Code2
10001 101 201
10002 101 201
10002 101 301
10002 102 202
10003 101 201
10003 101 301
10003 102 201
10003 102 301
10004 101 201
10004 101 201

 

 Desired Output file:

 

10001 101 201
10002 101 201
10002 101 301
10002 102 202
10003 101 201
10003 102 301
10004 101 201

 

Could you please let me know how can get the desired output file?

 

Thanks in advance!

Super User
Posts: 17,750

Re: How to select non duplicate values?

PROC SORT -> UNIQUEOUT option

Super User
Posts: 5,071

Re: How to select non duplicate values?

You don't have any patterns here that are consistent from one set of observations to the next.  Either fix your example, or explain what the rules are that make an observation a "duplicate".

Super User
Posts: 10,466

Re: How to select non duplicate values?

You need to provide some rules.

Why for customerid 10002 do you keep both of the code2 when the code1 is 101 but for customerid 10003 you keep the code2 value of 303 but not 201 when code1 is 102?

 

If those are types and both of the 10003 should have been kept it might be as simple as;

 

proc sort data= have out=want nodupkey;

   by customerid code1 code2;

run;

 

Though the values of any other variables in the output dataset could be problematic as to which record gets selected.

Frequent Contributor
Posts: 122

Re: How to select non duplicate values?

Hi,

Thanks for the response. 

 

Apologies for not being clear. 

In the case of 10002, I only want to keep code2 when code2 values are different.

 

10002 101 201
10002 101 301
10002 102 202

 

In the case of 10003, I only want to keep one row for code1 and code2 combo, because there is duplicate values in code1 and code 2. So I only want to keep one combination of code1/code2 which could be either (101/201 and 102/301) or (101/301 and 102/201).


10003 101 201
10003 101 301
10003 102 201
10003 102 301

Super User
Posts: 9,662

Re: How to select non duplicate values?

Yeah. Use proc sort . If your data arranged like what you showed.

 

data have;
input Customerid Code1 Code2;
cards;
10001 101 201
10002 101 201
10002 101 301
10002 102 202
10003 101 201
10003 101 301
10003 102 201
10003 102 301
10004 101 201
10004 101 201
;
run;
proc sort data=have out=want nodupkey;
 by Customerid Code2;
run;
Frequent Contributor
Posts: 122

Re: How to select non duplicate values?

[ Edited ]

Thank you. 

 

I am looking for the below output:

10001 101 201

10002 101 201

10002 101 301

10002 102 202

10003 101 201

10003 102 301

10004 101 201

 

Thanks.

Super User
Posts: 9,662

Re: How to select non duplicate values?

[ Edited ]

I can't follow what you are saying . It seems it was very difficult problem.

You need more words to get your question more clear.

Assuming I understood what you mean.

 

 

 

data x;
input Customerid Code1 Code2;
cards;
10001 101 201
10002 101 201
10002 101 301
10002 102 202
10003 101 201
10003 101 301
10003 102 201
10003 102 301
10004 101 201
10004 101 201
;
run;
proc sort data=x out=have nodupkey;
by Customerid Code1 Code2;
run;
data want;
if _n_ eq 1 then do;
if 0 then set have;
declare hash h(dataset:'have',multidata : 'y');
h.definekey('Customerid','Code2');
h.definedata('Customerid','Code1','Code2');
h.definedone();

declare hash c();
c.definekey('Customerid','Code2');
c.definedone();

declare hash cc();
cc.definekey('Customerid','Code1');
cc.definedone();
end;
set have;
if c.check() ne 0 then do;

_Code1=Code1;
rc=h.find();
do while(rc=0);
if _Code1=Code1 then do;c.add();h.removedup();leave;end;
rc=h.find_next();
end;

if h.check() ne 0 then output;
else do;
if cc.check() ne 0 then do;cc.add();output;end;
else do;h.find();output;end;
end;

end;
drop rc _Code1 ;
run;
Frequent Contributor
Posts: 122

Re: How to select non duplicate values?

Thank you! Apologies for not being clear.

 

I ran your code but it did not run successfully. I got the below error:

 

WARNING: Apparent symbolic reference COLON not resolved.
ERROR: DATA STEP Component Object failure. Aborted during the COMPILATION phase.
ERROR 79-322: Expecting a ).

 

Could you please advise?

Super User
Posts: 9,662

Re: How to select non duplicate values?

Here.

 declare hash h(dataset:'have',multidata : 'y');

-->

declare hash h(dataset:'have',multidata  :  'y'); 

 

Apparently . This forum transform : into colon.

Frequent Contributor
Posts: 122

Re: How to select non duplicate values?

Thanks a lot!!

Aprreciate your help!

Ask a Question
Discussion stats
  • 10 replies
  • 377 views
  • 0 likes
  • 5 in conversation