Hi Everyone,
I have the following data where due to the combination setting, record 1 and 3 are the same (a=1 and b=2) vs (b=2 and a=1)
Is there any way to eliminate 1 of them?
Thank you.
HHCFX
data w;
input f1 $ f1_value f2 $ f2_value;
datalines;
a 1 b 2
a 2 b 2
b 2 a 1
run;
Solution and More
This one is easiest to follow:
*create string and then sortc;
data have;
input f1 $ f1_value f2 $ f2_value f3 $ f3_value;
datalines;
a 1 b 2 c 0
a 2 b 2 c 0
b 2 a 1 c 0
b 9 a 1 c 0
c 0 a 1 b 2
;Run;
Data Temp;
Set Have;
Col1 = Catt(f1, f1_value);
Col2 = Catt(f2, f2_value);
Col3 = Catt(f3, f3_value);
Call Sortc(Col1, Col2, Col3);
Run;
Proc Sort Data = Temp Out = Want (Drop = Col1 Col2 Col3) Nodupkey;
By Col1 Col2 Col3;
Run;
http://support.sas.com/resources/papers/proceedings13/376-2013.pdf
data have;
input ID Product1 $ Quantity1 Product2 $ Quantity2 Product3 $ Quantity3;
datalines;
1 Printer 10 Com 25 Games 3
3 Games 8 Printer 50 Com 5
9 Printer 1 Games 3 Com 10
7 Printer 1 Com 10 Games 3
2 Games 3 Printer 1 Com 10
;run;
*the last 3 records are the same;
data report_By_product(drop=I J temp_P temp_Q ) ;
set have;
array P(3) $30 Product1- Product3;
array Q(3) Quantity1-Quantity3;
do I=1 to 3;
do J=1 to 3-I;
if not missing(P(J+1)) and P(J) > P(J+1) then do;
*sort product name in ascending order.;
Temp_P = P(J);
P(J)= P(J+1);
P(J+1) = Temp_P;
Temp_Q=Q(J);
Q(J)= Q(J+1);
Q(J+1) = Temp_Q;
end; end; end;
run;
proc sort data=have; by id; run;
data concatenation(drop=I Product1- Product3 Quantity1-Quantity3);
set have;
by id;
array P(3) $30 Product1- Product3;
array Q(3) Quantity1-Quantity3;
array sort_By_prod(3) $30 P1-P3;
*array sort_By_Qty(3) $30 Q1-Q3;
do I=1 to 3;
if not missing(P(I)) then do;
sort_By_prod(I)= strip(P(I))||"/"||put(Q(I), Z4.);
*sort_By_Qty(I) = put(Q(I), Z4.)||"/"|| strip(P(I));
end; end;
call sortC(of P1-P3); *sort by product name in ascending order;
*call sortC(of Q3-Q1); *sort by purchase quantity in descending order;
run;
proc sort data=concatenation out=want nodupkey;
by P1 P2 P3;run;
data want; set want; keep id;
proc sql; create table want
as select * from want as a left join have as b
on a.ID=b.ID; quit;
data have;
length f1 $32 f2 $32;
input f1 $ f1_value f2 $ f2_value;
datalines;
Q_TS_WDQ_3stage_trd2 1 Q_hroc_breakrg_close20 2
a 2 b 2
Q_hroc_breakrg_close20 2 Q_TS_WDQ_3stage_trd2 1
run;
data _null_;
if _n_=1 then do;
if 0 then set have;
dcl hash H () ;
h.definekey ('f1','f1_value','f2','f2_value') ;
h.definedata ('f1','f1_value','f2','f2_value') ;
h.definedone () ;
end;
set have end=last;
rc=h.check(key:f2,key:f2_value,key:f1,key:f1_value);
if rc ne 0 then h.add(key:f1,key:f1_value,key:f2,key:f2_value,data:f1,data:f1_value,data:f2,data:f2_value);
else h.replace(key:f2,key:f2_value,key:f1,key:f1_value,data:f1,data:f1_value,data:f2,data:f2_value);
if last then h.output(dataset:'want');
run;
data have;
input f1 $ f1_value f2 $ f2_value f3 $ f3_value;
datalines;
a 1 b 2 c 0
a 2 b 2 c 0
b 2 a 1 c 0
b 9 a 1 c 0
c 0 a 1 b 2
run;
*note: record: 1 3 5 are the same;
data _null_;
if _n_=1 then do;
if 0 then set have;
dcl hash H () ;
h.definekey ("f1",'f1_value','f2','f2_value','f3','f3_value') ;
h.definedata ("f1",'f1_value','f2','f2_value','f3','f3_value') ;
h.definedone () ;
end;
set have end=last;
h.replace(key:f2,key:f2_value,key:f1,key:fl_value,key:f3,key:f3_value, data:f1,data:f1_value,data:f2,data:f2_value,data:f3,data:f3_value);
if last then h.output(dataset:'want');
run;
data have;
input f1 $ f1_value f2 $ f2_value;
datalines;
a 1 b 2
a 2 b 2
b 2 a 1
run;
data _null_;
if _n_=1 then do;
if 0 then set have;
dcl hash H () ;
h.definekey ("f1",'f1_value','f2','f2_value') ;
h.definedata ("f1",'f1_value','f2','f2_value') ;
h.definedone () ;
end;
set have end=last;
h.replace(key:f2,key:f2_value,key:f1,key:fl_value,data:f1,data:f1_value,data:f2,data:f2_value);
if last then h.output(dataset:'want');
run;
data have;
input f1 $ f1_value f2 $ f2_value;
datalines;
a 1 b 2
a 2 b 2
b 2 a 1
run;
data _null_;
if _n_=1 then do;
if 0 then set have;
dcl hash H () ;
h.definekey ("f1",'f1_value','f2','f2_value') ;
h.definedata ("f1",'f1_value','f2','f2_value') ;
h.definedone () ;
end;
set have end=last;
h.replace(key:f2,key:f2_value,key:f1,key:fl_value,data:f1,data:f1_value,data:f2,data:f2_value);
if last then h.output(dataset:'want');
run;
There is something to do with the value of variable that make your code not working.
My sample data with long value in f1 and f2, the code doesn't help to eliminate the duplicate.
Can you please help me with that?
HHCFX
/*** This have file with long value --> the code does not work*/
data have;
length f1 $32 f2 $32;
input f1 $ f1_value f2 $ f2_value;
datalines;
Q_TS_WDQ_3stage_trd2 1 Q_hroc_breakrg_close20 2
a 2 b 2
Q_hroc_breakrg_close20 2 Q_TS_WDQ_3stage_trd2 1
run;
data have;
input f1 $ f1_value f2 $ f2_value;
datalines;
a 1 b 2
a 2 b 2
b 2 a 1
run;
data _null_;
if _n_=1 then do;
if 0 then set have;
dcl hash H () ;
h.definekey ("f1",'f1_value','f2','f2_value') ;
h.definedata ("f1",'f1_value','f2','f2_value') ;
h.definedone () ;
end;
set have end=last;
h.replace(key:f2,key:f2_value,key:f1,key:fl_value,data:f1,data:f1_value,data:f2,data:f2_value);
if last then h.output(dataset:'want');
run;
@hhchenfx It's just past midnight in Chicago and I don't have sas software at home. Let me get back to my college in the morning, test comprehensively and respond at my earliest convenience.
@hhchenfx Apologies for the delay as I was tired and couldn't wake up 😞
data have;
length f1 $32 f2 $32;
input f1 $ f1_value f2 $ f2_value;
datalines;
Q_TS_WDQ_3stage_trd2 1 Q_hroc_breakrg_close20 2
a 2 b 2
Q_hroc_breakrg_close20 2 Q_TS_WDQ_3stage_trd2 1
run;
data _null_;
if _n_=1 then do;
if 0 then set have;
dcl hash H () ;
h.definekey ('f1','f1_value','f2','f2_value') ;
h.definedata ('f1','f1_value','f2','f2_value') ;
h.definedone () ;
end;
set have end=last;
rc=h.check(key:f2,key:f2_value,key:f1,key:f1_value);
if rc ne 0 then h.add(key:f1,key:f1_value,key:f2,key:f2_value,data:f1,data:f1_value,data:f2,data:f2_value);
else h.replace(key:f2,key:f2_value,key:f1,key:f1_value,data:f1,data:f1_value,data:f2,data:f2_value);
if last then h.output(dataset:'want');
run;
Thanks a lot, Novinosrin!
OK, I can see I will need to learn hashing some day. Until that day, I would use:
data want;
set have;
if f1 > f2 then do;
chardum = f2;
numdum = f2_value;
f2 = f1;
f2_value = f1_value;
f1 = chardum;
f1_value = numdum;
end;
drop chardum numdum;
run;
proc sort data=want nodupkey;
by f1 f1_value f2 f2_value;
run;
I am speechless!!!
This..."whatever it is" is so so amazing!
and it is easy to manipulate!!!
Thank you so much, it is an open eye experience.
HHCFX
data w;
input f1 $ f1_value f2 $ f2_value;
datalines;
a 1 b 2
a 2 b 2
b 2 a 1
;
run;
data temp;
set w;
call sortc(f1,f2);
call sortn(f1_value,f2_value);
run;
proc sort data=temp out=want nodupkey;
by _all_;
run;
Thanks a lot, Ksharp,
I try to modify your code to apply to 3 factors but it doesn't work.
in this sample data, record 1,3,5 are the same.
Any tips, please?
data w;
input f1 $ f1_value f2 $ f2_value f3 $ f3_value;
datalines;
a 1 b 2 c 0
a 2 b 2 c 0
b 2 a 1 c 0
b 9 a 1 c 0
c 0 a 1 b 2
;run;
data temp;
set w;
call sortc(f1,f2,f3);
call sortn(f1_value,f2_value,f3_value);
run;
proc sort data=temp out=want nodupkey;
by _all_;
run;
OK. I overlooked the difficulty of this question. Try this one.
data w;
input f1 $ f1_value f2 $ f2_value f3 $ f3_value;
datalines;
a 1 b 2 c 0
a 2 b 2 c 0
b 2 a 1 c 0
b 9 a 1 c 0
c 0 a 1 b 2
;run;
data temp;
set w;
array x{*} $ f1 f2 f3;
array _x{*} f1_value f2_value f3_value;
do i=1 to dim(x)-1;
do j=i+1 to dim(x);
if x{i}>x{j} then do;
temp=x{i};x{i}=x{j};x{j}=temp;
temp=_x{i};_x{i}=_x{j};_x{j}=temp;
end;
end;
end;
drop i j temp;
run;
proc sort data=temp out=want nodupkey;
by _all_;
run;
proc print;run;
Thank you all a lot for helping me with that problem
HHCFX
Piggybacking off of what KSharp posted, the following method should work for you:
data have;
input f1 $ f1_value f2 $ f2_value f3 $ f3_value;
datalines;
a 1 b 2 c 0
a 2 b 2 c 0
b 2 a 1 c 0
b 9 a 1 c 0
c 0 a 1 b 2
;
Run;
Data Temp;
Set Have;
Col1 = Catt(f1, f1_value);
Col2 = Catt(f2, f2_value);
Col3 = Catt(f3, f3_value);
Call Sortc(Col1, Col2, Col3);
Run;
Proc Sort Data = Temp Out = Want (Drop = Col1 Col2 Col3) Nodupkey;
By Col1 Col2 Col3;
Run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.