BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
sas-inquirer
Quartz | Level 8

Hi there,

    This seems like a very simple problem, but I can't seem to resolve it on my own.  I need to delete a group when the groups values are duplicates of another.

 

data have;
input var1 var2 var3 $;
cards;
1 1 A
2 1 B
3 1 C
4 2 A
5 2 M
6 2 T
7 3 A
8 3 B
9 3 C
10 4 S
11 4 V
12 4 Z
;
run;
Want:
Var 1 Var2 Var3
1 1 A
2 1 B
3 1 C
4 2 A
5 2 M
6 2 T
10 4 S
11 4 V
12 4 Z
 
How can I delete an entire group if the values between two groups match?  (If all values in var3 = all values in var3 by var2 then delete last.var2).
 
 
I’ve tried sql:
proc sql;
create table want as
select * from have
group by var3
having var2 = min(var2)
order by var1;
quit;
 
And it deletes the duplicate var2 group fine, but it also deletes any duplicate value of var3.
 
I’ve also tried proc sort with a data step including nodups and first and last, but I could never get it to come close to my desired output.
 
Where am I going wrong?
1 ACCEPTED SOLUTION

Accepted Solutions
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Ah, I see.  The way I do it below, is to transpose up the data within each group, sort nodupkey the cat'd columns to get a disintct list, and then remove observations not having a group in the nodupkeylist.  Well, you will see the logic if you run it step by step.  Will have a think if there is an easier syntax for it.

data have;
input var1 var2 var3 $;
cards;
1 1 A
2 1 B
3 1 C
4 2 A
5 2 M
6 2 T
7 3 A
8 3 B
9 3 C
10 4 S
11 4 V
12 4 Z
;
run;

proc sort data=have out=have2;
by var2 var3;
run;

proc transpose data=have2 out=t_have;
by var2;
var var3;
run;

data t_have;
set t_have (drop=_name_);
all=catx(',',of col:);
run;

proc sort data=t_have nodupkey;
by all;
run;

proc sql;
delete from HAVE
where VAR2 not in (select VAR2 from T_HAVE);
quit;

One good thing about this approach is it removes data from the original table, leaving sort as is.

View solution in original post

12 REPLIES 12
sas-inquirer
Quartz | Level 8
I forgot to mention, I am using SAS 9.2.
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Sort the data, then drop not first records, then sort back into original order - note I set a variable N to keep original order:

data have;
  input var1 var2 var3 $;
  n=_n_;
cards;
1 1 A
2 1 B
3 1 C
4 2 A
5 2 M
6 2 T
7 3 A
8 3 B
9 3 C
10 4 S
11 4 V
12 4 Z
;
run;

proc sort data=have out=want;
  by var3 var2 var1;
run;

data want;
  set want;
  by var3 var2 var1;
  if not(first.var3) then delete;
run;

proc sort data=want;
  by n;
run;
sas-inquirer
Quartz | Level 8
Thanks the quick response RW9! I applied your solution and am getting results that not only remove the duplicate group from var2 but also any duplicate var3 values. I will have var3 values that duplicate a lot so need to keep those. Is there a way to restrict the deletion to just the duplicated group?
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Sorry, I can only work with what you post here, show the problem in data.  What defines the term group in your statement "Is there a way to restrict the deletion to just the duplicated group"?  I had thought by your original post that var3 indicated the group, and thus any subsequent appearances of that variable would be considered "duplicates".

sas-inquirer
Quartz | Level 8

Sorry I wasn't clear.  Var2 is the variable that contains the groups (i.e. 1, 2, 3 and 4).  Var3 are the values for each group (i.e. Group 1 contains the values A, B and C).  I am hoping to find a solution that identifies that groups 1 and 3 both have the same set of values (A, B, C) and to then delete group 3. 

 

I hope that helps.  Does "show the problem in data" mean you would prefer to work with a table that has more relevant data instead of example data?  I can put something together if that's the case.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Ah, I see.  The way I do it below, is to transpose up the data within each group, sort nodupkey the cat'd columns to get a disintct list, and then remove observations not having a group in the nodupkeylist.  Well, you will see the logic if you run it step by step.  Will have a think if there is an easier syntax for it.

data have;
input var1 var2 var3 $;
cards;
1 1 A
2 1 B
3 1 C
4 2 A
5 2 M
6 2 T
7 3 A
8 3 B
9 3 C
10 4 S
11 4 V
12 4 Z
;
run;

proc sort data=have out=have2;
by var2 var3;
run;

proc transpose data=have2 out=t_have;
by var2;
var var3;
run;

data t_have;
set t_have (drop=_name_);
all=catx(',',of col:);
run;

proc sort data=t_have nodupkey;
by all;
run;

proc sql;
delete from HAVE
where VAR2 not in (select VAR2 from T_HAVE);
quit;

One good thing about this approach is it removes data from the original table, leaving sort as is.

sas-inquirer
Quartz | Level 8

Wow thank you RW9!  That is a very clever solution!  I applied it to my data and it works like a charm.  

 

If you have time to explain, how did "of col:" in the following code segment remove the duplicated group from var2?  It's fascinating.

data t_have;
set t_have (drop=_name_);
all=catx(',',of col:);
run;

 

 

 

 

RW9
Diamond | Level 26 RW9
Diamond | Level 26

You will normally see the of used in conjunction with array statements, so thats probably a good place to start.  Say we have:
array col{5} abc def asd iop yui;

What this is doing is creating a reference to those variables: abc def asd iop yui.

This reference is col and has 5 elements.

Now if I do:

catx(',',of col{*});

This wil concatenate all values in the array col with commas.

 

Much the same in the way I have used it in my code.  There is a special syntax for specifying a range of variables with the same prefix, so in my example each variable I am interested in is prefixed with COL, so I can refer to all of them as: COL:

I can refer to this in the same way as if it was an array, so we end up with:

all=catx(',',of COL:);

Or to use long notation:

all=catx(',',COL1,COL2,COL3...);

 

sas-inquirer
Quartz | Level 8

Thanks for the explanation RW9.

FreelanceReinh
Jade | Level 19

Maybe I've overlooked something, but it seems to me that the data step creating variable ALL could be omitted. Variable ALL could be replaced by variable list COL: in the subsequent PROC SORT step.

TomKari
Onyx | Level 15

I'm pondering it, but I DO NOT consider this a simple problem! Hopefully, more soon.

 

Tom

sas-inquirer
Quartz | Level 8
That's actually very validating to hear Tom!

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 4530 views
  • 3 likes
  • 4 in conversation