Hi guys,
What I’m looking to do is to removes all duplicates of variable A so each variable A is unique to the output. However, a second variable (let’s call variable B) is a classification which has a desired hierarchy of the output.
If the hierarchy goes Class A, Class B, Class C then Class D then the below table
Variable A | Variable B |
Company 1 | Class A |
Company 2 | Class A |
Company 1 | Class B |
Company 3 | Class B |
Company 2 | Class C |
Company 4 | Class D |
Would go into
Variable A | Variable B |
Company 1 | Class A |
Company 2 | Class A |
Company 3 | Class B |
Company 4 | Class D |
Any idea?
Thanks
Assuming that the results mistakenly omitted the combination of Company 2 with Class C, you could use:
proc sort data=have out=want nodupkey;
by variable_a variable_b;
run;
If the posted results are actually correct, however, you might want:
proc sort data=have;
by variable_a variable_b;
run;
data want;
set have;
by variable_a;
if first.variable_a;
run;
The more I think about it, the more I think the bottom program is the right one and the question you posted got it right.
Hi Astounding,
No that was omitted on purpose. Variable A has to be unique and as Class A is higher up in the hierarchy than Class C, the line Company 2 Class C is removed.
OK, then go with the bottom program that I posted. There is a requirement, however. The sorted order must match the hierarchy. In other words, within a company, the program selects the "smallest" value of variable_b. So the "highest" priority in the hierarchy has to be the "smallest" value when sorting.
data have;
input VariableA $ &15. VariableB $ &15.;
cards;
Company 1 Class A
Company 2 Class A
Company 1 Class B
Company 3 Class B
Company 2 Class C
Company 4 Class D
;
data _null_;
if 0 then set have;
dcl hash H (dataset:'have',ordered: "A") ;
h.definekey ("variablea") ;
h.definedata ("variablea", "variableb") ;
h.definedone () ;
h.output(dataset:'want');
stop;
run;
You just have to have 2 sort procedures.
data have;
length var_A $9;
input var_A $ var_B $;
datalines;
Company_1 Class_A
Company_2 Class_A
Company_1 Class_B
Company_3 Class_B
Company_2 Class_C
Company_4 Class_D
;
proc print data=have;
run;
proc sort data=have;
by var_A var_B;
run;
proc sort nodupkey data=have out=want;
by var_A;
run;
proc print data=want;
run;
Output
1 | Company_1 | Class_A |
2 | Company_2 | Class_A |
3 | Company_3 | Class_B |
4 | Company_4 | Class_D |
Hi EENG,
do you know how I would do that but using a custom sort for variable B? In reality the order is not alphabetical.
But there is an order to B, right? Can't you create a new field like, If var_B = 'Tree' then var_B_id=1; else if var_B='Plant' then var_B_id=2; Then sort by var_A var_B_id; Then nodupkey on var_A. This way you do create an order to the field.
that was one of the first things I've tried. The issue with using the order like that is that I want to order first by variable A then A but remove duplicates for A not B. Sorry, I'm probably not explaining myself well. Might help if I switch variables around. If you see table below, the order is by Var_b Var_a but then removing duplicates is Var_A from top to bottom in order Var_b Var_a. So lines in red would be removed.
Variable B | Variable A |
1_VB | Company 2 |
1_VB | Company 1 |
2_VB | Company 2 |
2_VB | Company 3 |
3_VB | Company 3 |
3_VB | Company 1 |
this makes using proc sort hard because the order has to be the same for removing duplicates. I've also tried labelling using First.var_A but it comes up with the same issue that the order have to be just Var_a.
Posted alternative questions asking for a counter that cannot be sorted. here: https://communities.sas.com/t5/SAS-Procedures/Counter-Variable-without-sorting/m-p/479082#M71438
now ill just limit for count =1 and that's done.
Thanks for the help.
Yeah, maybe I was not following. But looking at the table you had, maybe I need to see the data. I thought I was getting what you needed.
data have;
length var_A $9;
input var_B $ var_A $;
datalines;
1_VB Company_2
1_VB Company_1
2_VB Company_2
2_VB Company_3
3_VB Company_3
3_VB Company_1
;
proc print data=have noobs;
var var_B var_A;
run;
proc sort data=have;
by var_B var_A;
run;
proc print data=have noobs;
var var_B var_A;
run;
proc sort nodupkey data=have out=want;
by var_A;
run;
proc print data=want noobs;
var var_B var_A;
run;
The input data...sorted...results
| |||||||||||||||
| |||||||||||||||
|
Catch the best of SAS Innovate 2025 — anytime, anywhere. Stream powerful keynotes, real-world demos, and game-changing insights from the world’s leading data and AI minds.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.