BookmarkSubscribeRSS Feed
renjithr
Quartz | Level 8

Hi All,

 

I have a file with customerid, code1 and code2 varialbes. 

 

Input file:

Customerid Code1 Code2
10001 101 201
10002 101 201
10002 101 301
10002 102 202
10003 101 201
10003 101 301
10003 102 201
10003 102 301
10004 101 201
10004 101 201

 

 Desired Output file:

 

10001 101 201
10002 101 201
10002 101 301
10002 102 202
10003 101 201
10003 102 301
10004 101 201

 

Could you please let me know how can get the desired output file?

 

Thanks in advance!

10 REPLIES 10
Reeza
Super User

PROC SORT -> UNIQUEOUT option

Astounding
PROC Star

You don't have any patterns here that are consistent from one set of observations to the next.  Either fix your example, or explain what the rules are that make an observation a "duplicate".

ballardw
Super User

You need to provide some rules.

Why for customerid 10002 do you keep both of the code2 when the code1 is 101 but for customerid 10003 you keep the code2 value of 303 but not 201 when code1 is 102?

 

If those are types and both of the 10003 should have been kept it might be as simple as;

 

proc sort data= have out=want nodupkey;

   by customerid code1 code2;

run;

 

Though the values of any other variables in the output dataset could be problematic as to which record gets selected.

renjithr
Quartz | Level 8

Hi,

Thanks for the response. 

 

Apologies for not being clear. 

In the case of 10002, I only want to keep code2 when code2 values are different.

 

10002 101 201
10002 101 301
10002 102 202

 

In the case of 10003, I only want to keep one row for code1 and code2 combo, because there is duplicate values in code1 and code 2. So I only want to keep one combination of code1/code2 which could be either (101/201 and 102/301) or (101/301 and 102/201).


10003 101 201
10003 101 301
10003 102 201
10003 102 301

Ksharp
Super User

Yeah. Use proc sort . If your data arranged like what you showed.

 

data have;
input Customerid Code1 Code2;
cards;
10001 101 201
10002 101 201
10002 101 301
10002 102 202
10003 101 201
10003 101 301
10003 102 201
10003 102 301
10004 101 201
10004 101 201
;
run;
proc sort data=have out=want nodupkey;
 by Customerid Code2;
run;
renjithr
Quartz | Level 8

Thank you. 

 

I am looking for the below output:

10001 101 201

10002 101 201

10002 101 301

10002 102 202

10003 101 201

10003 102 301

10004 101 201

 

Thanks.

Ksharp
Super User

I can't follow what you are saying . It seems it was very difficult problem.

You need more words to get your question more clear.

Assuming I understood what you mean.

 

 

 

data x;
input Customerid Code1 Code2;
cards;
10001 101 201
10002 101 201
10002 101 301
10002 102 202
10003 101 201
10003 101 301
10003 102 201
10003 102 301
10004 101 201
10004 101 201
;
run;
proc sort data=x out=have nodupkey;
by Customerid Code1 Code2;
run;
data want;
if _n_ eq 1 then do;
if 0 then set have;
declare hash h(dataset:'have',multidata : 'y');
h.definekey('Customerid','Code2');
h.definedata('Customerid','Code1','Code2');
h.definedone();

declare hash c();
c.definekey('Customerid','Code2');
c.definedone();

declare hash cc();
cc.definekey('Customerid','Code1');
cc.definedone();
end;
set have;
if c.check() ne 0 then do;

_Code1=Code1;
rc=h.find();
do while(rc=0);
if _Code1=Code1 then do;c.add();h.removedup();leave;end;
rc=h.find_next();
end;

if h.check() ne 0 then output;
else do;
if cc.check() ne 0 then do;cc.add();output;end;
else do;h.find();output;end;
end;

end;
drop rc _Code1 ;
run;
renjithr
Quartz | Level 8

Thank you! Apologies for not being clear.

 

I ran your code but it did not run successfully. I got the below error:

 

WARNING: Apparent symbolic reference COLON not resolved.
ERROR: DATA STEP Component Object failure. Aborted during the COMPILATION phase.
ERROR 79-322: Expecting a ).

 

Could you please advise?

Ksharp
Super User

Here.

 declare hash h(dataset:'have',multidata : 'y');

-->

declare hash h(dataset:'have',multidata  :  'y'); 

 

Apparently . This forum transform : into colon.

renjithr
Quartz | Level 8

Thanks a lot!!

Aprreciate your help!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 2013 views
  • 0 likes
  • 5 in conversation