BookmarkSubscribeRSS Feed
CharlesFowler72
Obsidian | Level 7

Hi guys,

What I’m looking to do is to removes all duplicates of variable A so each variable A is unique to the output. However, a second variable (let’s call variable B) is a classification which has a desired hierarchy of the output.

If the hierarchy goes Class A, Class B, Class C then Class D then the below table

Variable A

Variable B

Company 1

Class A

Company 2

Class A

Company 1

Class B

Company 3

Class B

Company 2

Class C

Company 4

Class D

 

Would go into

Variable A

Variable B

Company 1

Class A

Company 2

Class A

Company 3

Class B

Company 4

Class D

 

Any idea?

Thanks

11 REPLIES 11
Astounding
PROC Star

Assuming that the results mistakenly omitted the combination of Company 2 with Class C, you could use:

 

proc sort data=have out=want nodupkey;

  by variable_a variable_b;

run;

 

If the posted results are actually correct, however, you might want:

 

proc sort data=have;

   by variable_a variable_b;

run;

 

data want;

set have;

by variable_a;

if first.variable_a;

run;

 

The more I think about it, the more I think the bottom program is the right one and the question you posted got it right.

CharlesFowler72
Obsidian | Level 7

Hi Astounding,

No that was omitted on purpose. Variable A has to be unique and as Class A is higher up in the hierarchy than Class C, the line Company 2 Class C is removed.

Astounding
PROC Star

OK, then go with the bottom program that I posted.  There is a requirement, however.  The sorted order must match the hierarchy.  In other words, within a company, the program selects the "smallest" value of variable_b.  So the "highest" priority in the hierarchy has to be the "smallest" value when sorting.

CharlesFowler72
Obsidian | Level 7
That's close but the next issue is that Variable B is not alphabetical. It is a list. Sorry that's my fault for not thinking and just putting A, B and C. If I use your second solution the order is not correct for the data step.
novinosrin
Tourmaline | Level 20
data have;
input VariableA	 $ &15. VariableB $ &15.;
cards;
Company 1	Class A
Company 2	Class A
Company 1	Class B
Company 3	Class B
Company 2	Class C
Company 4	Class D
;
data _null_;
if 0 then set have;
dcl hash H (dataset:'have',ordered: "A") ;
   h.definekey  ("variablea") ;
   h.definedata ("variablea", "variableb") ;
   h.definedone () ;
h.output(dataset:'want');
stop;
run;
EEng
Obsidian | Level 7

You just have to have 2 sort procedures.

 

data have;
	 length var_A $9;
	 input var_A $ var_B $;
	 datalines;
Company_1 Class_A
Company_2 Class_A
Company_1 Class_B
Company_3 Class_B
Company_2 Class_C
Company_4 Class_D
;
proc print data=have;
run;
proc sort data=have;
	 by var_A var_B;
run;
proc sort nodupkey data=have out=want;
	 by var_A;
run;
proc print data=want;
run;

Output

 

Obs var_A var_B
1Company_1Class_A
2Company_2Class_A
3Company_3Class_B
4Company_4Class_D
CharlesFowler72
Obsidian | Level 7

Hi EENG,

 

do you know how I would do that but using a custom sort for variable B? In reality the order is not alphabetical.

EEng
Obsidian | Level 7

But there is an order to B, right? Can't you create a new field like, If var_B = 'Tree' then var_B_id=1; else if var_B='Plant' then var_B_id=2; Then sort by var_A var_B_id; Then nodupkey on var_A. This way you do create an order to the field.

CharlesFowler72
Obsidian | Level 7

that was one of the first things I've tried. The issue with using the order like that is that I want to order first by variable A then A but remove duplicates for A not B. Sorry, I'm probably not explaining myself well. Might help if I switch variables around. If you see table below, the order is by Var_b Var_a but then removing duplicates is Var_A from top to bottom in order Var_b Var_a. So lines in red would be removed.

 

Variable B

Variable A

1_VB

Company 2

1_VB

Company 1

2_VB

Company 2

2_VB

Company 3

3_VB

Company 3

3_VB

Company 1

 

this makes using proc sort hard because the order has to be the same for removing duplicates. I've also tried labelling using First.var_A but it comes up with the same issue that the order have to be just Var_a.

CharlesFowler72
Obsidian | Level 7

Posted alternative questions asking for a counter that cannot be sorted. here: https://communities.sas.com/t5/SAS-Procedures/Counter-Variable-without-sorting/m-p/479082#M71438

now ill just limit for count =1 and that's done.

 

Thanks for the help.

EEng
Obsidian | Level 7

Yeah, maybe I was not following. But looking at the table you had, maybe I need to see the data. I thought I was getting what you needed.

 

data have;
	 length var_A $9;
	 input var_B $ var_A $;
	 datalines;
1_VB Company_2
1_VB Company_1
2_VB Company_2
2_VB Company_3
3_VB Company_3
3_VB Company_1
;
proc print data=have noobs;
	 var var_B var_A;
run;
proc sort data=have;
	 by var_B var_A;
run;
proc print data=have noobs;
	 var var_B var_A;
run;
proc sort nodupkey data=have out=want;
	 by var_A;
run;
proc print data=want noobs;
	 var var_B var_A;
run;

 

The input data...sorted...results

var_B var_A
1_VBCompany_2
1_VBCompany_1
2_VBCompany_2
2_VBCompany_3
3_VBCompany_3
3_VBCompany_1
var_B var_A
1_VBCompany_1
1_VBCompany_2
2_VBCompany_2
2_VBCompany_3
3_VBCompany_1
3_VBCompany_3
var_B var_A
1_VBCompany_1
1_VBCompany_2
2_VBCompany_3

sas-innovate-white.png

Missed SAS Innovate in Orlando?

Catch the best of SAS Innovate 2025 — anytime, anywhere. Stream powerful keynotes, real-world demos, and game-changing insights from the world’s leading data and AI minds.

 

Register now

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 2356 views
  • 0 likes
  • 4 in conversation