BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
sas-inquirer
Quartz | Level 8

Hi there,

    This seems like a very simple problem, but I can't seem to resolve it on my own.  I need to delete a group when the groups values are duplicates of another.

 

data have;
input var1 var2 var3 $;
cards;
1 1 A
2 1 B
3 1 C
4 2 A
5 2 M
6 2 T
7 3 A
8 3 B
9 3 C
10 4 S
11 4 V
12 4 Z
;
run;
Want:
Var 1 Var2 Var3
1 1 A
2 1 B
3 1 C
4 2 A
5 2 M
6 2 T
10 4 S
11 4 V
12 4 Z
 
How can I delete an entire group if the values between two groups match?  (If all values in var3 = all values in var3 by var2 then delete last.var2).
 
 
I’ve tried sql:
proc sql;
create table want as
select * from have
group by var3
having var2 = min(var2)
order by var1;
quit;
 
And it deletes the duplicate var2 group fine, but it also deletes any duplicate value of var3.
 
I’ve also tried proc sort with a data step including nodups and first and last, but I could never get it to come close to my desired output.
 
Where am I going wrong?
1 ACCEPTED SOLUTION

Accepted Solutions
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Ah, I see.  The way I do it below, is to transpose up the data within each group, sort nodupkey the cat'd columns to get a disintct list, and then remove observations not having a group in the nodupkeylist.  Well, you will see the logic if you run it step by step.  Will have a think if there is an easier syntax for it.

data have;
input var1 var2 var3 $;
cards;
1 1 A
2 1 B
3 1 C
4 2 A
5 2 M
6 2 T
7 3 A
8 3 B
9 3 C
10 4 S
11 4 V
12 4 Z
;
run;

proc sort data=have out=have2;
by var2 var3;
run;

proc transpose data=have2 out=t_have;
by var2;
var var3;
run;

data t_have;
set t_have (drop=_name_);
all=catx(',',of col:);
run;

proc sort data=t_have nodupkey;
by all;
run;

proc sql;
delete from HAVE
where VAR2 not in (select VAR2 from T_HAVE);
quit;

One good thing about this approach is it removes data from the original table, leaving sort as is.

View solution in original post

12 REPLIES 12
sas-inquirer
Quartz | Level 8
I forgot to mention, I am using SAS 9.2.
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Sort the data, then drop not first records, then sort back into original order - note I set a variable N to keep original order:

data have;
  input var1 var2 var3 $;
  n=_n_;
cards;
1 1 A
2 1 B
3 1 C
4 2 A
5 2 M
6 2 T
7 3 A
8 3 B
9 3 C
10 4 S
11 4 V
12 4 Z
;
run;

proc sort data=have out=want;
  by var3 var2 var1;
run;

data want;
  set want;
  by var3 var2 var1;
  if not(first.var3) then delete;
run;

proc sort data=want;
  by n;
run;
sas-inquirer
Quartz | Level 8
Thanks the quick response RW9! I applied your solution and am getting results that not only remove the duplicate group from var2 but also any duplicate var3 values. I will have var3 values that duplicate a lot so need to keep those. Is there a way to restrict the deletion to just the duplicated group?
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Sorry, I can only work with what you post here, show the problem in data.  What defines the term group in your statement "Is there a way to restrict the deletion to just the duplicated group"?  I had thought by your original post that var3 indicated the group, and thus any subsequent appearances of that variable would be considered "duplicates".

sas-inquirer
Quartz | Level 8

Sorry I wasn't clear.  Var2 is the variable that contains the groups (i.e. 1, 2, 3 and 4).  Var3 are the values for each group (i.e. Group 1 contains the values A, B and C).  I am hoping to find a solution that identifies that groups 1 and 3 both have the same set of values (A, B, C) and to then delete group 3. 

 

I hope that helps.  Does "show the problem in data" mean you would prefer to work with a table that has more relevant data instead of example data?  I can put something together if that's the case.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Ah, I see.  The way I do it below, is to transpose up the data within each group, sort nodupkey the cat'd columns to get a disintct list, and then remove observations not having a group in the nodupkeylist.  Well, you will see the logic if you run it step by step.  Will have a think if there is an easier syntax for it.

data have;
input var1 var2 var3 $;
cards;
1 1 A
2 1 B
3 1 C
4 2 A
5 2 M
6 2 T
7 3 A
8 3 B
9 3 C
10 4 S
11 4 V
12 4 Z
;
run;

proc sort data=have out=have2;
by var2 var3;
run;

proc transpose data=have2 out=t_have;
by var2;
var var3;
run;

data t_have;
set t_have (drop=_name_);
all=catx(',',of col:);
run;

proc sort data=t_have nodupkey;
by all;
run;

proc sql;
delete from HAVE
where VAR2 not in (select VAR2 from T_HAVE);
quit;

One good thing about this approach is it removes data from the original table, leaving sort as is.

sas-inquirer
Quartz | Level 8

Wow thank you RW9!  That is a very clever solution!  I applied it to my data and it works like a charm.  

 

If you have time to explain, how did "of col:" in the following code segment remove the duplicated group from var2?  It's fascinating.

data t_have;
set t_have (drop=_name_);
all=catx(',',of col:);
run;

 

 

 

 

RW9
Diamond | Level 26 RW9
Diamond | Level 26

You will normally see the of used in conjunction with array statements, so thats probably a good place to start.  Say we have:
array col{5} abc def asd iop yui;

What this is doing is creating a reference to those variables: abc def asd iop yui.

This reference is col and has 5 elements.

Now if I do:

catx(',',of col{*});

This wil concatenate all values in the array col with commas.

 

Much the same in the way I have used it in my code.  There is a special syntax for specifying a range of variables with the same prefix, so in my example each variable I am interested in is prefixed with COL, so I can refer to all of them as: COL:

I can refer to this in the same way as if it was an array, so we end up with:

all=catx(',',of COL:);

Or to use long notation:

all=catx(',',COL1,COL2,COL3...);

 

sas-inquirer
Quartz | Level 8

Thanks for the explanation RW9.

FreelanceReinh
Jade | Level 19

Maybe I've overlooked something, but it seems to me that the data step creating variable ALL could be omitted. Variable ALL could be replaced by variable list COL: in the subsequent PROC SORT step.

TomKari
Onyx | Level 15

I'm pondering it, but I DO NOT consider this a simple problem! Hopefully, more soon.

 

Tom

sas-inquirer
Quartz | Level 8
That's actually very validating to hear Tom!

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 12 replies
  • 6547 views
  • 3 likes
  • 4 in conversation