DATA Step, Macro, Functions and more

Eliminate duplicate record when factor and value mixed up

Accepted Solution Solved
Reply
Super Contributor
Posts: 503
Accepted Solution

Eliminate duplicate record when factor and value mixed up

[ Edited ]

Hi Everyone,

I have the following data where due to the combination setting, record 1 and 3 are the same (a=1 and b=2) vs (b=2 and a=1)

Is there any way to eliminate 1 of them?

Thank you.

HHCFX

 

data w; 
input f1 $ f1_value f2 $ f2_value;
datalines;
a 1 b 2
a 2 b 2
b 2 a 1
run;

 

Solution and More

 

This one is easiest to follow:

*create string and then sortc;

data have; 
input f1 $ f1_value f2 $ f2_value f3 $ f3_value;
datalines;
a 1 b 2 c 0
a 2 b 2 c 0
b 2 a 1 c 0
b 9 a 1 c 0
c 0 a 1 b 2
;Run;

Data Temp;
Set Have;
Col1 = Catt(f1, f1_value);
Col2 = Catt(f2, f2_value);
Col3 = Catt(f3, f3_value);
Call Sortc(Col1, Col2, Col3);
Run;

Proc Sort Data = Temp Out = Want (Drop = Col1 Col2 Col3) Nodupkey;
By Col1 Col2 Col3;
Run;

 

http://support.sas.com/resources/papers/proceedings13/376-2013.pdf 

data have;
input ID Product1 $ Quantity1 Product2 $ Quantity2 Product3 $ Quantity3;
datalines;
1 Printer 10 Com 25 Games 3
3 Games 8 Printer 50 Com 5
9 Printer 1 Games 3 Com 10
7 Printer 1  Com 10 Games 3
2 Games 3 Printer 1 Com 10
;run;
*the last 3 records are the same;

data report_By_product(drop=I J temp_P temp_Q ) ;
set have;
array P(3) $30 Product1- Product3;
array Q(3) Quantity1-Quantity3;
do I=1 to 3;
do J=1 to 3-I;
if not missing(P(J+1)) and P(J) > P(J+1) then do;
*sort product name in ascending order.;
Temp_P = P(J);
P(J)= P(J+1);
P(J+1) = Temp_P; 
Temp_Q=Q(J);
Q(J)= Q(J+1);
Q(J+1) = Temp_Q;
end; end; end;
run; 

proc sort data=have; by id; run;
data concatenation(drop=I Product1- Product3 Quantity1-Quantity3);
set have;
by id;
array P(3) $30 Product1- Product3;
array Q(3) Quantity1-Quantity3;
array sort_By_prod(3) $30 P1-P3;
*array sort_By_Qty(3) $30 Q1-Q3;

do I=1 to 3;
if not missing(P(I)) then do;
sort_By_prod(I)= strip(P(I))||"/"||put(Q(I), Z4.);
*sort_By_Qty(I) = put(Q(I), Z4.)||"/"|| strip(P(I));
end; end;

call sortC(of P1-P3); *sort by product name in ascending order;
*call sortC(of Q3-Q1); *sort by purchase quantity in descending order;
run; 

proc sort data=concatenation out=want nodupkey;
by P1 P2 P3;run;

data want; set want; keep id;

proc sql; create table want
as select * from want as a left join have as b
on a.ID=b.ID; quit;

 

data have;
length f1 $32 f2 $32; 
input f1 $ f1_value f2 $ f2_value;
datalines;
Q_TS_WDQ_3stage_trd2 1 Q_hroc_breakrg_close20 2
a 2 b 2
Q_hroc_breakrg_close20 2 Q_TS_WDQ_3stage_trd2 1
run;

data _null_;
if _n_=1 then do;
if 0 then set have;
dcl hash H () ;
   h.definekey  ('f1','f1_value','f2','f2_value') ;
   h.definedata ('f1','f1_value','f2','f2_value') ;
   h.definedone () ;
   end;
set have end=last;
rc=h.check(key:f2,key:f2_value,key:f1,key:f1_value);
if rc ne 0 then h.add(key:f1,key:f1_value,key:f2,key:f2_value,data:f1,data:f1_value,data:f2,data:f2_value);
else h.replace(key:f2,key:f2_value,key:f1,key:f1_value,data:f1,data:f1_value,data:f2,data:f2_value);
if last then h.output(dataset:'want');
run;

 

data have; 
input f1 $ f1_value f2 $ f2_value f3 $ f3_value;
datalines;
a 1 b 2 c 0
a 2 b 2 c 0
b 2 a 1 c 0
b 9 a 1 c 0
c 0 a 1 b 2
run;

*note: record: 1 3 5 are the same;

data _null_;
if _n_=1 then do;
if 0 then set have;
dcl hash H () ;
 h.definekey ("f1",'f1_value','f2','f2_value','f3','f3_value') ;
 h.definedata ("f1",'f1_value','f2','f2_value','f3','f3_value') ;
 h.definedone () ;
 end;
set have end=last;
h.replace(key:f2,key:f2_value,key:f1,key:fl_value,key:f3,key:f3_value, data:f1,data:f1_value,data:f2,data:f2_value,data:f3,data:f3_value);
if last then h.output(dataset:'want');
run;

 

 


Accepted Solutions
Solution
‎03-21-2018 10:25 PM
PROC Star
Posts: 1,334

Re: Eliminate duplicate record

[ Edited ]
data have; 
input f1 $ f1_value f2 $ f2_value;
datalines;
a 1 b 2
a 2 b 2
b 2 a 1
run;

data _null_;
if _n_=1 then do;
if 0 then set have;
dcl hash H () ;
   h.definekey  ("f1",'f1_value','f2','f2_value') ;
   h.definedata ("f1",'f1_value','f2','f2_value') ;
   h.definedone () ;
   end;
set have end=last;
h.replace(key:f2,key:f2_value,key:f1,key:fl_value,data:f1,data:f1_value,data:f2,data:f2_value);
if last then h.output(dataset:'want');
run;

View solution in original post


All Replies
Solution
‎03-21-2018 10:25 PM
PROC Star
Posts: 1,334

Re: Eliminate duplicate record

[ Edited ]
data have; 
input f1 $ f1_value f2 $ f2_value;
datalines;
a 1 b 2
a 2 b 2
b 2 a 1
run;

data _null_;
if _n_=1 then do;
if 0 then set have;
dcl hash H () ;
   h.definekey  ("f1",'f1_value','f2','f2_value') ;
   h.definedata ("f1",'f1_value','f2','f2_value') ;
   h.definedone () ;
   end;
set have end=last;
h.replace(key:f2,key:f2_value,key:f1,key:fl_value,data:f1,data:f1_value,data:f2,data:f2_value);
if last then h.output(dataset:'want');
run;
Super Contributor
Posts: 503

Re: Eliminate duplicate record

[ Edited ]
Posted in reply to novinosrin

There is something to do with the value of variable that make your code not working.

My sample data with long value in f1 and f2, the code doesn't help to eliminate the duplicate.

Can you please help me with that?

HHCFX

 

/*** This have file with long value  --> the code does not work*/
data have; length f1 $32 f2 $32; input f1 $ f1_value f2 $ f2_value; datalines; Q_TS_WDQ_3stage_trd2 1 Q_hroc_breakrg_close20 2 a 2 b 2 Q_hroc_breakrg_close20 2 Q_TS_WDQ_3stage_trd2 1 run;
 

 


data have; 
input f1 $ f1_value f2 $ f2_value;
datalines;
a 1 b 2
a 2 b 2
b 2 a 1
run;


data _null_;
if _n_=1 then do;
if 0 then set have;
dcl hash H () ;
   h.definekey  ("f1",'f1_value','f2','f2_value') ;
   h.definedata ("f1",'f1_value','f2','f2_value') ;
   h.definedone () ;
   end;
set have end=last;
h.replace(key:f2,key:f2_value,key:f1,key:fl_value,data:f1,data:f1_value,data:f2,data:f2_value);
if last then h.output(dataset:'want');
run;
PROC Star
Posts: 1,334

Re: Eliminate duplicate record

@hhchenfx  It's just past midnight in Chicago and I don't have sas software at home. Let me get back to my college in the morning, test comprehensively and respond at my earliest convenience. 

PROC Star
Posts: 1,334

Re: Eliminate duplicate record

Posted in reply to novinosrin

@hhchenfx Apologies for the delay as I was tired and couldn't wake up Smiley Sad  

 

data have;
length f1 $32 f2 $32; 
input f1 $ f1_value f2 $ f2_value;
datalines;
Q_TS_WDQ_3stage_trd2 1 Q_hroc_breakrg_close20 2
a 2 b 2
Q_hroc_breakrg_close20 2 Q_TS_WDQ_3stage_trd2 1
run;

data _null_;
if _n_=1 then do;
if 0 then set have;
dcl hash H () ;
   h.definekey  ('f1','f1_value','f2','f2_value') ;
   h.definedata ('f1','f1_value','f2','f2_value') ;
   h.definedone () ;
   end;
set have end=last;
rc=h.check(key:f2,key:f2_value,key:f1,key:f1_value);
if rc ne 0 then h.add(key:f1,key:f1_value,key:f2,key:f2_value,data:f1,data:f1_value,data:f2,data:f2_value);
else h.replace(key:f2,key:f2_value,key:f1,key:f1_value,data:f1,data:f1_value,data:f2,data:f2_value);
if last then h.output(dataset:'want');
run;
Super Contributor
Posts: 503

Re: Eliminate duplicate record

Posted in reply to novinosrin

 

Super User
Posts: 6,541

Re: Eliminate duplicate record

OK, I can see I will need to learn hashing some day.  Until that day, I would use:

 

data want;

set have;

if f1 > f2 then do;

   chardum = f2;

   numdum = f2_value;

   f2 = f1;

   f2_value = f1_value;

   f1 = chardum;

   f1_value = numdum;

end;

drop chardum numdum;

run;

 

proc sort data=want nodupkey;

   by f1 f1_value f2 f2_value;

run;

Super Contributor
Posts: 503

Re: Eliminate duplicate record

Posted in reply to Astounding

I am speechless!!!

 

This..."whatever it is" is so so amazing!

 

and it is easy to manipulate!!!

 

Thank you so much, it is an open eye experience. 

 

HHCFX

 

 

 

Super User
Posts: 10,618

Re: Eliminate duplicate record when factor and value mixed up

data w; 
input f1 $ f1_value f2 $ f2_value;
datalines;
a 1 b 2
a 2 b 2
b 2 a 1
;
run;
data temp;
 set w; 
 call sortc(f1,f2);
 call sortn(f1_value,f2_value);
run;
proc sort data=temp out=want nodupkey;
by _all_;
run;
Super Contributor
Posts: 503

Re: Eliminate duplicate record when factor and value mixed up

[ Edited ]

Thanks a lot, Ksharp,

I try to modify your code to apply to 3 factors but it doesn't work.

in this sample data, record 1,3,5 are the same.

Any tips, please?

 


data w; 
input f1 $ f1_value f2 $ f2_value f3 $ f3_value;
datalines;
a 1 b 2 c 0
a 2 b 2 c 0
b 2 a 1 c 0
b 9 a 1 c 0
c 0 a 1 b 2
;run;

data temp;
 set w; 
 call sortc(f1,f2,f3);
 call sortn(f1_value,f2_value,f3_value);
run;
proc sort data=temp out=want nodupkey;
by _all_;
run;

 

Super User
Posts: 10,618

Re: Eliminate duplicate record when factor and value mixed up

OK. I overlooked the difficulty of this question. Try this one.

 

data w; 
input f1 $ f1_value f2 $ f2_value f3 $ f3_value;
datalines;
a 1 b 2 c 0
a 2 b 2 c 0
b 2 a 1 c 0
b 9 a 1 c 0
c 0 a 1 b 2
;run;

data temp;
 set w; 
 array x{*} $ f1 f2 f3;
 array _x{*} f1_value f2_value f3_value;
 do i=1 to dim(x)-1;
   do j=i+1 to dim(x);
    if x{i}>x{j} then do;
      temp=x{i};x{i}=x{j};x{j}=temp;
      temp=_x{i};_x{i}=_x{j};_x{j}=temp;
    end;
   end;
 end;
 drop i j temp;
run;
proc sort data=temp out=want nodupkey;
by _all_;
run;
proc print;run;
Super Contributor
Posts: 503

Re: Eliminate duplicate record when factor and value mixed up

Thank you all a lot for helping me with that problem

HHCFX

Contributor
Posts: 23

Re: Eliminate duplicate record when factor and value mixed up

Piggybacking off of what KSharp posted, the following method should work for you: 

 

 

data have; 
input f1 $ f1_value f2 $ f2_value f3 $ f3_value;
datalines;
a 1 b 2 c 0
a 2 b 2 c 0
b 2 a 1 c 0
b 9 a 1 c 0
c 0 a 1 b 2
;
Run; Data Temp; Set Have; Col1 = Catt(f1, f1_value); Col2 = Catt(f2, f2_value); Col3 = Catt(f3, f3_value); Call Sortc(Col1, Col2, Col3); Run; Proc Sort Data = Temp Out = Want (Drop = Col1 Col2 Col3) Nodupkey; By Col1 Col2 Col3; Run;

 

 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 12 replies
  • 310 views
  • 6 likes
  • 5 in conversation