Hi guys,
What I’m looking to do is to removes all duplicates of variable A so each variable A is unique to the output. However, a second variable (let’s call variable B) is a classification which has a desired hierarchy of the output.
If the hierarchy goes Class A, Class B, Class C then Class D then the below table
Variable A | Variable B |
Company 1 | Class A |
Company 2 | Class A |
Company 1 | Class B |
Company 3 | Class B |
Company 2 | Class C |
Company 4 | Class D |
Would go into
Variable A | Variable B |
Company 1 | Class A |
Company 2 | Class A |
Company 3 | Class B |
Company 4 | Class D |
Any idea?
Thanks
Assuming that the results mistakenly omitted the combination of Company 2 with Class C, you could use:
proc sort data=have out=want nodupkey;
by variable_a variable_b;
run;
If the posted results are actually correct, however, you might want:
proc sort data=have;
by variable_a variable_b;
run;
data want;
set have;
by variable_a;
if first.variable_a;
run;
The more I think about it, the more I think the bottom program is the right one and the question you posted got it right.
Hi Astounding,
No that was omitted on purpose. Variable A has to be unique and as Class A is higher up in the hierarchy than Class C, the line Company 2 Class C is removed.
OK, then go with the bottom program that I posted. There is a requirement, however. The sorted order must match the hierarchy. In other words, within a company, the program selects the "smallest" value of variable_b. So the "highest" priority in the hierarchy has to be the "smallest" value when sorting.
data have;
input VariableA $ &15. VariableB $ &15.;
cards;
Company 1 Class A
Company 2 Class A
Company 1 Class B
Company 3 Class B
Company 2 Class C
Company 4 Class D
;
data _null_;
if 0 then set have;
dcl hash H (dataset:'have',ordered: "A") ;
h.definekey ("variablea") ;
h.definedata ("variablea", "variableb") ;
h.definedone () ;
h.output(dataset:'want');
stop;
run;
You just have to have 2 sort procedures.
data have;
length var_A $9;
input var_A $ var_B $;
datalines;
Company_1 Class_A
Company_2 Class_A
Company_1 Class_B
Company_3 Class_B
Company_2 Class_C
Company_4 Class_D
;
proc print data=have;
run;
proc sort data=have;
by var_A var_B;
run;
proc sort nodupkey data=have out=want;
by var_A;
run;
proc print data=want;
run;
Output
1 | Company_1 | Class_A |
2 | Company_2 | Class_A |
3 | Company_3 | Class_B |
4 | Company_4 | Class_D |
Hi EENG,
do you know how I would do that but using a custom sort for variable B? In reality the order is not alphabetical.
But there is an order to B, right? Can't you create a new field like, If var_B = 'Tree' then var_B_id=1; else if var_B='Plant' then var_B_id=2; Then sort by var_A var_B_id; Then nodupkey on var_A. This way you do create an order to the field.
that was one of the first things I've tried. The issue with using the order like that is that I want to order first by variable A then A but remove duplicates for A not B. Sorry, I'm probably not explaining myself well. Might help if I switch variables around. If you see table below, the order is by Var_b Var_a but then removing duplicates is Var_A from top to bottom in order Var_b Var_a. So lines in red would be removed.
Variable B | Variable A |
1_VB | Company 2 |
1_VB | Company 1 |
2_VB | Company 2 |
2_VB | Company 3 |
3_VB | Company 3 |
3_VB | Company 1 |
this makes using proc sort hard because the order has to be the same for removing duplicates. I've also tried labelling using First.var_A but it comes up with the same issue that the order have to be just Var_a.
Posted alternative questions asking for a counter that cannot be sorted. here: https://communities.sas.com/t5/SAS-Procedures/Counter-Variable-without-sorting/m-p/479082#M71438
now ill just limit for count =1 and that's done.
Thanks for the help.
Yeah, maybe I was not following. But looking at the table you had, maybe I need to see the data. I thought I was getting what you needed.
data have;
length var_A $9;
input var_B $ var_A $;
datalines;
1_VB Company_2
1_VB Company_1
2_VB Company_2
2_VB Company_3
3_VB Company_3
3_VB Company_1
;
proc print data=have noobs;
var var_B var_A;
run;
proc sort data=have;
by var_B var_A;
run;
proc print data=have noobs;
var var_B var_A;
run;
proc sort nodupkey data=have out=want;
by var_A;
run;
proc print data=want noobs;
var var_B var_A;
run;
The input data...sorted...results
| |||||||||||||||
| |||||||||||||||
|
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.