BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
hhchenfx
Barite | Level 11

Hi Everyone,

I have the following data where due to the combination setting, record 1 and 3 are the same (a=1 and b=2) vs (b=2 and a=1)

Is there any way to eliminate 1 of them?

Thank you.

HHCFX

 

data w; 
input f1 $ f1_value f2 $ f2_value;
datalines;
a 1 b 2
a 2 b 2
b 2 a 1
run;

 

Solution and More

 

This one is easiest to follow:

*create string and then sortc;

data have; 
input f1 $ f1_value f2 $ f2_value f3 $ f3_value;
datalines;
a 1 b 2 c 0
a 2 b 2 c 0
b 2 a 1 c 0
b 9 a 1 c 0
c 0 a 1 b 2
;Run;

Data Temp;
Set Have;
Col1 = Catt(f1, f1_value);
Col2 = Catt(f2, f2_value);
Col3 = Catt(f3, f3_value);
Call Sortc(Col1, Col2, Col3);
Run;

Proc Sort Data = Temp Out = Want (Drop = Col1 Col2 Col3) Nodupkey;
By Col1 Col2 Col3;
Run;

 

http://support.sas.com/resources/papers/proceedings13/376-2013.pdf 

data have;
input ID Product1 $ Quantity1 Product2 $ Quantity2 Product3 $ Quantity3;
datalines;
1 Printer 10 Com 25 Games 3
3 Games 8 Printer 50 Com 5
9 Printer 1 Games 3 Com 10
7 Printer 1  Com 10 Games 3
2 Games 3 Printer 1 Com 10
;run;
*the last 3 records are the same;

data report_By_product(drop=I J temp_P temp_Q ) ;
set have;
array P(3) $30 Product1- Product3;
array Q(3) Quantity1-Quantity3;
do I=1 to 3;
do J=1 to 3-I;
if not missing(P(J+1)) and P(J) > P(J+1) then do;
*sort product name in ascending order.;
Temp_P = P(J);
P(J)= P(J+1);
P(J+1) = Temp_P; 
Temp_Q=Q(J);
Q(J)= Q(J+1);
Q(J+1) = Temp_Q;
end; end; end;
run; 

proc sort data=have; by id; run;
data concatenation(drop=I Product1- Product3 Quantity1-Quantity3);
set have;
by id;
array P(3) $30 Product1- Product3;
array Q(3) Quantity1-Quantity3;
array sort_By_prod(3) $30 P1-P3;
*array sort_By_Qty(3) $30 Q1-Q3;

do I=1 to 3;
if not missing(P(I)) then do;
sort_By_prod(I)= strip(P(I))||"/"||put(Q(I), Z4.);
*sort_By_Qty(I) = put(Q(I), Z4.)||"/"|| strip(P(I));
end; end;

call sortC(of P1-P3); *sort by product name in ascending order;
*call sortC(of Q3-Q1); *sort by purchase quantity in descending order;
run; 

proc sort data=concatenation out=want nodupkey;
by P1 P2 P3;run;

data want; set want; keep id;

proc sql; create table want
as select * from want as a left join have as b
on a.ID=b.ID; quit;

 

data have;
length f1 $32 f2 $32; 
input f1 $ f1_value f2 $ f2_value;
datalines;
Q_TS_WDQ_3stage_trd2 1 Q_hroc_breakrg_close20 2
a 2 b 2
Q_hroc_breakrg_close20 2 Q_TS_WDQ_3stage_trd2 1
run;

data _null_;
if _n_=1 then do;
if 0 then set have;
dcl hash H () ;
   h.definekey  ('f1','f1_value','f2','f2_value') ;
   h.definedata ('f1','f1_value','f2','f2_value') ;
   h.definedone () ;
   end;
set have end=last;
rc=h.check(key:f2,key:f2_value,key:f1,key:f1_value);
if rc ne 0 then h.add(key:f1,key:f1_value,key:f2,key:f2_value,data:f1,data:f1_value,data:f2,data:f2_value);
else h.replace(key:f2,key:f2_value,key:f1,key:f1_value,data:f1,data:f1_value,data:f2,data:f2_value);
if last then h.output(dataset:'want');
run;

 

data have; 
input f1 $ f1_value f2 $ f2_value f3 $ f3_value;
datalines;
a 1 b 2 c 0
a 2 b 2 c 0
b 2 a 1 c 0
b 9 a 1 c 0
c 0 a 1 b 2
run;

*note: record: 1 3 5 are the same;

data _null_;
if _n_=1 then do;
if 0 then set have;
dcl hash H () ;
 h.definekey ("f1",'f1_value','f2','f2_value','f3','f3_value') ;
 h.definedata ("f1",'f1_value','f2','f2_value','f3','f3_value') ;
 h.definedone () ;
 end;
set have end=last;
h.replace(key:f2,key:f2_value,key:f1,key:fl_value,key:f3,key:f3_value, data:f1,data:f1_value,data:f2,data:f2_value,data:f3,data:f3_value);
if last then h.output(dataset:'want');
run;

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20
data have; 
input f1 $ f1_value f2 $ f2_value;
datalines;
a 1 b 2
a 2 b 2
b 2 a 1
run;

data _null_;
if _n_=1 then do;
if 0 then set have;
dcl hash H () ;
   h.definekey  ("f1",'f1_value','f2','f2_value') ;
   h.definedata ("f1",'f1_value','f2','f2_value') ;
   h.definedone () ;
   end;
set have end=last;
h.replace(key:f2,key:f2_value,key:f1,key:fl_value,data:f1,data:f1_value,data:f2,data:f2_value);
if last then h.output(dataset:'want');
run;

View solution in original post

12 REPLIES 12
novinosrin
Tourmaline | Level 20
data have; 
input f1 $ f1_value f2 $ f2_value;
datalines;
a 1 b 2
a 2 b 2
b 2 a 1
run;

data _null_;
if _n_=1 then do;
if 0 then set have;
dcl hash H () ;
   h.definekey  ("f1",'f1_value','f2','f2_value') ;
   h.definedata ("f1",'f1_value','f2','f2_value') ;
   h.definedone () ;
   end;
set have end=last;
h.replace(key:f2,key:f2_value,key:f1,key:fl_value,data:f1,data:f1_value,data:f2,data:f2_value);
if last then h.output(dataset:'want');
run;
hhchenfx
Barite | Level 11

There is something to do with the value of variable that make your code not working.

My sample data with long value in f1 and f2, the code doesn't help to eliminate the duplicate.

Can you please help me with that?

HHCFX

 

/*** This have file with long value  --> the code does not work*/
data have; length f1 $32 f2 $32; input f1 $ f1_value f2 $ f2_value; datalines; Q_TS_WDQ_3stage_trd2 1 Q_hroc_breakrg_close20 2 a 2 b 2 Q_hroc_breakrg_close20 2 Q_TS_WDQ_3stage_trd2 1 run;
 

 


data have; 
input f1 $ f1_value f2 $ f2_value;
datalines;
a 1 b 2
a 2 b 2
b 2 a 1
run;


data _null_;
if _n_=1 then do;
if 0 then set have;
dcl hash H () ;
   h.definekey  ("f1",'f1_value','f2','f2_value') ;
   h.definedata ("f1",'f1_value','f2','f2_value') ;
   h.definedone () ;
   end;
set have end=last;
h.replace(key:f2,key:f2_value,key:f1,key:fl_value,data:f1,data:f1_value,data:f2,data:f2_value);
if last then h.output(dataset:'want');
run;
novinosrin
Tourmaline | Level 20

@hhchenfx  It's just past midnight in Chicago and I don't have sas software at home. Let me get back to my college in the morning, test comprehensively and respond at my earliest convenience. 

novinosrin
Tourmaline | Level 20

@hhchenfx Apologies for the delay as I was tired and couldn't wake up 😞  

 

data have;
length f1 $32 f2 $32; 
input f1 $ f1_value f2 $ f2_value;
datalines;
Q_TS_WDQ_3stage_trd2 1 Q_hroc_breakrg_close20 2
a 2 b 2
Q_hroc_breakrg_close20 2 Q_TS_WDQ_3stage_trd2 1
run;

data _null_;
if _n_=1 then do;
if 0 then set have;
dcl hash H () ;
   h.definekey  ('f1','f1_value','f2','f2_value') ;
   h.definedata ('f1','f1_value','f2','f2_value') ;
   h.definedone () ;
   end;
set have end=last;
rc=h.check(key:f2,key:f2_value,key:f1,key:f1_value);
if rc ne 0 then h.add(key:f1,key:f1_value,key:f2,key:f2_value,data:f1,data:f1_value,data:f2,data:f2_value);
else h.replace(key:f2,key:f2_value,key:f1,key:f1_value,data:f1,data:f1_value,data:f2,data:f2_value);
if last then h.output(dataset:'want');
run;
hhchenfx
Barite | Level 11

 

Astounding
PROC Star

OK, I can see I will need to learn hashing some day.  Until that day, I would use:

 

data want;

set have;

if f1 > f2 then do;

   chardum = f2;

   numdum = f2_value;

   f2 = f1;

   f2_value = f1_value;

   f1 = chardum;

   f1_value = numdum;

end;

drop chardum numdum;

run;

 

proc sort data=want nodupkey;

   by f1 f1_value f2 f2_value;

run;

hhchenfx
Barite | Level 11

I am speechless!!!

 

This..."whatever it is" is so so amazing!

 

and it is easy to manipulate!!!

 

Thank you so much, it is an open eye experience. 

 

HHCFX

 

 

 

Ksharp
Super User
data w; 
input f1 $ f1_value f2 $ f2_value;
datalines;
a 1 b 2
a 2 b 2
b 2 a 1
;
run;
data temp;
 set w; 
 call sortc(f1,f2);
 call sortn(f1_value,f2_value);
run;
proc sort data=temp out=want nodupkey;
by _all_;
run;
hhchenfx
Barite | Level 11

Thanks a lot, Ksharp,

I try to modify your code to apply to 3 factors but it doesn't work.

in this sample data, record 1,3,5 are the same.

Any tips, please?

 


data w; 
input f1 $ f1_value f2 $ f2_value f3 $ f3_value;
datalines;
a 1 b 2 c 0
a 2 b 2 c 0
b 2 a 1 c 0
b 9 a 1 c 0
c 0 a 1 b 2
;run;

data temp;
 set w; 
 call sortc(f1,f2,f3);
 call sortn(f1_value,f2_value,f3_value);
run;
proc sort data=temp out=want nodupkey;
by _all_;
run;

 

Ksharp
Super User

OK. I overlooked the difficulty of this question. Try this one.

 

data w; 
input f1 $ f1_value f2 $ f2_value f3 $ f3_value;
datalines;
a 1 b 2 c 0
a 2 b 2 c 0
b 2 a 1 c 0
b 9 a 1 c 0
c 0 a 1 b 2
;run;

data temp;
 set w; 
 array x{*} $ f1 f2 f3;
 array _x{*} f1_value f2_value f3_value;
 do i=1 to dim(x)-1;
   do j=i+1 to dim(x);
    if x{i}>x{j} then do;
      temp=x{i};x{i}=x{j};x{j}=temp;
      temp=_x{i};_x{i}=_x{j};_x{j}=temp;
    end;
   end;
 end;
 drop i j temp;
run;
proc sort data=temp out=want nodupkey;
by _all_;
run;
proc print;run;
hhchenfx
Barite | Level 11

Thank you all a lot for helping me with that problem

HHCFX

jdwaterman91
Obsidian | Level 7

Piggybacking off of what KSharp posted, the following method should work for you: 

 

 

data have; 
input f1 $ f1_value f2 $ f2_value f3 $ f3_value;
datalines;
a 1 b 2 c 0
a 2 b 2 c 0
b 2 a 1 c 0
b 9 a 1 c 0
c 0 a 1 b 2
;
Run; Data Temp; Set Have; Col1 = Catt(f1, f1_value); Col2 = Catt(f2, f2_value); Col3 = Catt(f3, f3_value); Call Sortc(Col1, Col2, Col3); Run; Proc Sort Data = Temp Out = Want (Drop = Col1 Col2 Col3) Nodupkey; By Col1 Col2 Col3; Run;

 

 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 12 replies
  • 2079 views
  • 6 likes
  • 5 in conversation