BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jlajla
Obsidian | Level 7

I'd like to keep only observations that include all values of a category.

 

I have this:

 

ID    Season

1      Winter

1      Winter

1      Spring

1      Summer

1      Summer

1      Fall

2      Winter

2       Winter

2      Spring

3      Winter

3      Fall

3       Fall

 

And I'd like to keep only ID 1 b/c it includes all categories for the season var, and I'd like to delete IDs 2 & 3 because they do not include all 4 categories for the season var.  Also, some IDs have have repeat observation for the same season category.

 

Does anyone have a suggestion for this?  I'm using SAS Studio. Thank you in advance.

1 ACCEPTED SOLUTION

Accepted Solutions
jlajla
Obsidian | Level 7

Thank you everyone for the suggestions. I forgot to include in my original question that i needed to keep one obs per year as well. So I needed one observation per year per month and needed to keep only the IDs that had values for months 6-9.  This is the solution I found, after spending too much time on it!

 

 

data have;
input site year month;
cards;
1   97   6
1   97  6
1   97  7
1   98   8
1   99  8
1    97  9
2    97  7
2    97  7
2    97  8
3   98 8
3   99 9
3   99 9
.....;
run;

 

*keeping only 1 obs per month/yr;
proc sort data1=test nodupkey;
by site year month;
run;

*transpose by site and year, narrow to wide, w/ id month and var year;
proc transpose data=test1 out=test2 prefix=monthyr;
by site year;
id month;
var year;
run;

 

*make all_month var for only obs w/ months 6-9;
data test3;
set test2;
if monthyr6 ne '.' and monthyr7 ne '.' and monthyr8 ne '.' and monthyr9 ne '.' then all_month=1; else all_month=0;
run;

 

*merge into narrow dataset;
proc sort data=test; by site; run;
proc sort data=test3; by site year; run;
data test4;
MERGE test test3;
BY site year;
drop _name_;
drop monthyr6-monthyr10;
RUN;

 

*delete obs w/o months 6-9;
proc sort data=test4; by site year; run;
data test5;
set test4;
if all_month=0 then delete;
run;

View solution in original post

6 REPLIES 6
Ksharp
Super User
data have;
input ID    Season $;
cards;
1      Winter
1      Winter
1      Spring
1      Summer
1      Summer
1      Fall
2      Winter
2       Winter
2      Spring
3      Winter
3      Fall
3       Fall
;
run;
proc sql;
 create table want as 
  select *
   from have
    group by id
     having count(distinct Season)=
      (select count(distinct Season) from have);
quit;
jlajla
Obsidian | Level 7

great, thank you very much.

Kurt_Bremser
Super User

data step method (more stuff to write, might be faster with big data):

data int (keep=id);
set have;
by id;
retain Spring Summer Fall Winter;
if first.id then do;
  Spring = 0; Summer = 0; Fall = 0; Winter = 0;
end;
select (Season);
  when ('Spring') Spring = 1;
  when ('Summer') Summer = 1;
  when ('Fall') Fall = 1;
  when ('Winter') Winter = 1;
end;
if last.id and sum(Spring,Summer,Fall,Winter) = 4 then output;
run;

data want;
merge
  int (in=a)
  have
;
by id;
if a;
run;

LinusH
Tourmaline | Level 20

The SQL approach is neat, but requires that you really have clean data. A single non expected category value can affect the total result.

 

The data step approach includes hard coding values, which in most real world scenarios is not preferable/possible.

 

If you use a separate look-up table (with category master data) you could be sure to count only valid categories (one way is to us in a sub-query to a where in-clause).

Data never sleeps
jlajla
Obsidian | Level 7

Thank you everyone for the suggestions. I forgot to include in my original question that i needed to keep one obs per year as well. So I needed one observation per year per month and needed to keep only the IDs that had values for months 6-9.  This is the solution I found, after spending too much time on it!

 

 

data have;
input site year month;
cards;
1   97   6
1   97  6
1   97  7
1   98   8
1   99  8
1    97  9
2    97  7
2    97  7
2    97  8
3   98 8
3   99 9
3   99 9
.....;
run;

 

*keeping only 1 obs per month/yr;
proc sort data1=test nodupkey;
by site year month;
run;

*transpose by site and year, narrow to wide, w/ id month and var year;
proc transpose data=test1 out=test2 prefix=monthyr;
by site year;
id month;
var year;
run;

 

*make all_month var for only obs w/ months 6-9;
data test3;
set test2;
if monthyr6 ne '.' and monthyr7 ne '.' and monthyr8 ne '.' and monthyr9 ne '.' then all_month=1; else all_month=0;
run;

 

*merge into narrow dataset;
proc sort data=test; by site; run;
proc sort data=test3; by site year; run;
data test4;
MERGE test test3;
BY site year;
drop _name_;
drop monthyr6-monthyr10;
RUN;

 

*delete obs w/o months 6-9;
proc sort data=test4; by site year; run;
data test5;
set test4;
if all_month=0 then delete;
run;

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 1404 views
  • 3 likes
  • 4 in conversation