Solved: Re: Eliminate duplicate record

hhchenfx · Posted 03-21-2018 05:54 PM

Hi Everyone,

I have the following data where due to the combination setting, record 1 and 3 are the same (a=1 and b=2) vs (b=2 and a=1)

Is there any way to eliminate 1 of them?

Thank you.

HHCFX

data w; 
input f1 $ f1_value f2 $ f2_value;
datalines;
a 1 b 2
a 2 b 2
b 2 a 1
run;

Solution and More

This one is easiest to follow:

*create string and then sortc;

data have; 
input f1 $ f1_value f2 $ f2_value f3 $ f3_value;
datalines;
a 1 b 2 c 0
a 2 b 2 c 0
b 2 a 1 c 0
b 9 a 1 c 0
c 0 a 1 b 2
;Run;

Data Temp;
Set Have;
Col1 = Catt(f1, f1_value);
Col2 = Catt(f2, f2_value);
Col3 = Catt(f3, f3_value);
Call Sortc(Col1, Col2, Col3);
Run;

Proc Sort Data = Temp Out = Want (Drop = Col1 Col2 Col3) Nodupkey;
By Col1 Col2 Col3;
Run;

http://support.sas.com/resources/papers/proceedings13/376-2013.pdf

data have;
input ID Product1 $ Quantity1 Product2 $ Quantity2 Product3 $ Quantity3;
datalines;
1 Printer 10 Com 25 Games 3
3 Games 8 Printer 50 Com 5
9 Printer 1 Games 3 Com 10
7 Printer 1  Com 10 Games 3
2 Games 3 Printer 1 Com 10
;run;
*the last 3 records are the same;

data report_By_product(drop=I J temp_P temp_Q ) ;
set have;
array P(3) $30 Product1- Product3;
array Q(3) Quantity1-Quantity3;
do I=1 to 3;
do J=1 to 3-I;
if not missing(P(J+1)) and P(J) > P(J+1) then do;
*sort product name in ascending order.;
Temp_P = P(J);
P(J)= P(J+1);
P(J+1) = Temp_P; 
Temp_Q=Q(J);
Q(J)= Q(J+1);
Q(J+1) = Temp_Q;
end; end; end;
run; 

proc sort data=have; by id; run;
data concatenation(drop=I Product1- Product3 Quantity1-Quantity3);
set have;
by id;
array P(3) $30 Product1- Product3;
array Q(3) Quantity1-Quantity3;
array sort_By_prod(3) $30 P1-P3;
*array sort_By_Qty(3) $30 Q1-Q3;

do I=1 to 3;
if not missing(P(I)) then do;
sort_By_prod(I)= strip(P(I))||"/"||put(Q(I), Z4.);
*sort_By_Qty(I) = put(Q(I), Z4.)||"/"|| strip(P(I));
end; end;

call sortC(of P1-P3); *sort by product name in ascending order;
*call sortC(of Q3-Q1); *sort by purchase quantity in descending order;
run; 

proc sort data=concatenation out=want nodupkey;
by P1 P2 P3;run;

data want; set want; keep id;

proc sql; create table want
as select * from want as a left join have as b
on a.ID=b.ID; quit;

data have;
length f1 $32 f2 $32; 
input f1 $ f1_value f2 $ f2_value;
datalines;
Q_TS_WDQ_3stage_trd2 1 Q_hroc_breakrg_close20 2
a 2 b 2
Q_hroc_breakrg_close20 2 Q_TS_WDQ_3stage_trd2 1
run;

data _null_;
if _n_=1 then do;
if 0 then set have;
dcl hash H () ;
   h.definekey  ('f1','f1_value','f2','f2_value') ;
   h.definedata ('f1','f1_value','f2','f2_value') ;
   h.definedone () ;
   end;
set have end=last;
rc=h.check(key:f2,key:f2_value,key:f1,key:f1_value);
if rc ne 0 then h.add(key:f1,key:f1_value,key:f2,key:f2_value,data:f1,data:f1_value,data:f2,data:f2_value);
else h.replace(key:f2,key:f2_value,key:f1,key:f1_value,data:f1,data:f1_value,data:f2,data:f2_value);
if last then h.output(dataset:'want');
run;

data have; 
input f1 $ f1_value f2 $ f2_value f3 $ f3_value;
datalines;
a 1 b 2 c 0
a 2 b 2 c 0
b 2 a 1 c 0
b 9 a 1 c 0
c 0 a 1 b 2
run;

*note: record: 1 3 5 are the same;

data _null_;
if _n_=1 then do;
if 0 then set have;
dcl hash H () ;
 h.definekey ("f1",'f1_value','f2','f2_value','f3','f3_value') ;
 h.definedata ("f1",'f1_value','f2','f2_value','f3','f3_value') ;
 h.definedone () ;
 end;
set have end=last;
h.replace(key:f2,key:f2_value,key:f1,key:fl_value,key:f3,key:f3_value, data:f1,data:f1_value,data:f2,data:f2_value,data:f3,data:f3_value);
if last then h.output(dataset:'want');
run;

novinosrin · Posted 03-21-2018 07:16 PM

data have; 
input f1 $ f1_value f2 $ f2_value;
datalines;
a 1 b 2
a 2 b 2
b 2 a 1
run;

data _null_;
if _n_=1 then do;
if 0 then set have;
dcl hash H () ;
   h.definekey  ("f1",'f1_value','f2','f2_value') ;
   h.definedata ("f1",'f1_value','f2','f2_value') ;
   h.definedone () ;
   end;
set have end=last;
h.replace(key:f2,key:f2_value,key:f1,key:fl_value,data:f1,data:f1_value,data:f2,data:f2_value);
if last then h.output(dataset:'want');
run;

View solution in original post

novinosrin · Posted 03-21-2018 07:16 PM

data have; 
input f1 $ f1_value f2 $ f2_value;
datalines;
a 1 b 2
a 2 b 2
b 2 a 1
run;

data _null_;
if _n_=1 then do;
if 0 then set have;
dcl hash H () ;
   h.definekey  ("f1",'f1_value','f2','f2_value') ;
   h.definedata ("f1",'f1_value','f2','f2_value') ;
   h.definedone () ;
   end;
set have end=last;
h.replace(key:f2,key:f2_value,key:f1,key:fl_value,data:f1,data:f1_value,data:f2,data:f2_value);
if last then h.output(dataset:'want');
run;

hhchenfx · Posted 03-22-2018 12:07 AM

There is something to do with the value of variable that make your code not working.

My sample data with long value in f1 and f2, the code doesn't help to eliminate the duplicate.

Can you please help me with that?

HHCFX

/*** This have file with long value  --> the code does not work*/
data have;
length f1 $32 f2 $32; 
input f1 $ f1_value f2 $ f2_value;
datalines;
Q_TS_WDQ_3stage_trd2 1 Q_hroc_breakrg_close20 2
a 2 b 2
Q_hroc_breakrg_close20 2 Q_TS_WDQ_3stage_trd2 1
run;


data have; 
input f1 $ f1_value f2 $ f2_value;
datalines;
a 1 b 2
a 2 b 2
b 2 a 1
run;


data _null_;
if _n_=1 then do;
if 0 then set have;
dcl hash H () ;
   h.definekey  ("f1",'f1_value','f2','f2_value') ;
   h.definedata ("f1",'f1_value','f2','f2_value') ;
   h.definedone () ;
   end;
set have end=last;
h.replace(key:f2,key:f2_value,key:f1,key:fl_value,data:f1,data:f1_value,data:f2,data:f2_value);
if last then h.output(dataset:'want');
run;

novinosrin · Posted 03-22-2018 01:21 AM

@hhchenfx It's just past midnight in Chicago and I don't have sas software at home. Let me get back to my college in the morning, test comprehensively and respond at my earliest convenience.

novinosrin · Posted 03-22-2018 02:22 PM

@hhchenfx Apologies for the delay as I was tired and couldn't wake up 😞

data have;
length f1 $32 f2 $32; 
input f1 $ f1_value f2 $ f2_value;
datalines;
Q_TS_WDQ_3stage_trd2 1 Q_hroc_breakrg_close20 2
a 2 b 2
Q_hroc_breakrg_close20 2 Q_TS_WDQ_3stage_trd2 1
run;

data _null_;
if _n_=1 then do;
if 0 then set have;
dcl hash H () ;
   h.definekey  ('f1','f1_value','f2','f2_value') ;
   h.definedata ('f1','f1_value','f2','f2_value') ;
   h.definedone () ;
   end;
set have end=last;
rc=h.check(key:f2,key:f2_value,key:f1,key:f1_value);
if rc ne 0 then h.add(key:f1,key:f1_value,key:f2,key:f2_value,data:f1,data:f1_value,data:f2,data:f2_value);
else h.replace(key:f2,key:f2_value,key:f1,key:f1_value,data:f1,data:f1_value,data:f2,data:f2_value);
if last then h.output(dataset:'want');
run;

hhchenfx · Posted 03-22-2018 10:12 PM

Thanks a lot, Novinosrin!

Astounding · Posted 03-21-2018 08:24 PM

OK, I can see I will need to learn hashing some day. Until that day, I would use:

data want;

set have;

if f1 > f2 then do;

chardum = f2;

numdum = f2_value;

f2 = f1;

f2_value = f1_value;

f1 = chardum;

f1_value = numdum;

end;

drop chardum numdum;

run;

proc sort data=want nodupkey;

by f1 f1_value f2 f2_value;

run;

hhchenfx · Posted 03-21-2018 10:28 PM

I am speechless!!!

This..."whatever it is" is so so amazing!

and it is easy to manipulate!!!

Thank you so much, it is an open eye experience.

HHCFX

Ksharp · Posted 03-22-2018 08:45 AM

data w; 
input f1 $ f1_value f2 $ f2_value;
datalines;
a 1 b 2
a 2 b 2
b 2 a 1
;
run;
data temp;
 set w; 
 call sortc(f1,f2);
 call sortn(f1_value,f2_value);
run;
proc sort data=temp out=want nodupkey;
by _all_;
run;

hhchenfx · Posted 03-22-2018 10:31 AM

Thanks a lot, Ksharp,

I try to modify your code to apply to 3 factors but it doesn't work.

in this sample data, record 1,3,5 are the same.

Any tips, please?


data w; 
input f1 $ f1_value f2 $ f2_value f3 $ f3_value;
datalines;
a 1 b 2 c 0
a 2 b 2 c 0
b 2 a 1 c 0
b 9 a 1 c 0
c 0 a 1 b 2
;run;

data temp;
 set w; 
 call sortc(f1,f2,f3);
 call sortn(f1_value,f2_value,f3_value);
run;
proc sort data=temp out=want nodupkey;
by _all_;
run;

Ksharp · Posted 03-23-2018 09:17 AM

OK. I overlooked the difficulty of this question. Try this one.

data w; 
input f1 $ f1_value f2 $ f2_value f3 $ f3_value;
datalines;
a 1 b 2 c 0
a 2 b 2 c 0
b 2 a 1 c 0
b 9 a 1 c 0
c 0 a 1 b 2
;run;

data temp;
 set w; 
 array x{*} $ f1 f2 f3;
 array _x{*} f1_value f2_value f3_value;
 do i=1 to dim(x)-1;
   do j=i+1 to dim(x);
    if x{i}>x{j} then do;
      temp=x{i};x{i}=x{j};x{j}=temp;
      temp=_x{i};_x{i}=_x{j};_x{j}=temp;
    end;
   end;
 end;
 drop i j temp;
run;
proc sort data=temp out=want nodupkey;
by _all_;
run;
proc print;run;

hhchenfx · Posted 03-23-2018 11:45 AM

Thank you all a lot for helping me with that problem

HHCFX

jdwaterman91 · Posted 03-23-2018 06:46 PM

Piggybacking off of what KSharp posted, the following method should work for you:

data have; 
input f1 $ f1_value f2 $ f2_value f3 $ f3_value;
datalines;
a 1 b 2 c 0
a 2 b 2 c 0
b 2 a 1 c 0
b 9 a 1 c 0
c 0 a 1 b 2
;
Run;

Data Temp;
Set Have;
Col1 = Catt(f1, f1_value);
Col2 = Catt(f2, f2_value);
Col3 = Catt(f3, f3_value);
Call Sortc(Col1, Col2, Col3);
Run;

Proc Sort Data = Temp Out = Want (Drop = Col1 Col2 Col3) Nodupkey;
By Col1 Col2 Col3;
Run;

Eliminate duplicate record when factor and value mixed up

Re: Eliminate duplicate record

Re: Eliminate duplicate record

Re: Eliminate duplicate record

Re: Eliminate duplicate record

Re: Eliminate duplicate record

Re: Eliminate duplicate record

Re: Eliminate duplicate record

Re: Eliminate duplicate record

Re: Eliminate duplicate record when factor and value mixed up

Re: Eliminate duplicate record when factor and value mixed up

Re: Eliminate duplicate record when factor and value mixed up

Re: Eliminate duplicate record when factor and value mixed up

Re: Eliminate duplicate record when factor and value mixed up

Registration is open

SAS Training: Just a Click Away