I'd like to keep only observations that include all values of a category.
I have this:
ID Season
1 Winter
1 Winter
1 Spring
1 Summer
1 Summer
1 Fall
2 Winter
2 Winter
2 Spring
3 Winter
3 Fall
3 Fall
And I'd like to keep only ID 1 b/c it includes all categories for the season var, and I'd like to delete IDs 2 & 3 because they do not include all 4 categories for the season var. Also, some IDs have have repeat observation for the same season category.
Does anyone have a suggestion for this? I'm using SAS Studio. Thank you in advance.
Thank you everyone for the suggestions. I forgot to include in my original question that i needed to keep one obs per year as well. So I needed one observation per year per month and needed to keep only the IDs that had values for months 6-9. This is the solution I found, after spending too much time on it!
data have;
input site year month;
cards;
1 97 6
1 97 6
1 97 7
1 98 8
1 99 8
1 97 9
2 97 7
2 97 7
2 97 8
3 98 8
3 99 9
3 99 9
.....;
run;
*keeping only 1 obs per month/yr;
proc sort data1=test nodupkey;
by site year month;
run;
*transpose by site and year, narrow to wide, w/ id month and var year;
proc transpose data=test1 out=test2 prefix=monthyr;
by site year;
id month;
var year;
run;
*make all_month var for only obs w/ months 6-9;
data test3;
set test2;
if monthyr6 ne '.' and monthyr7 ne '.' and monthyr8 ne '.' and monthyr9 ne '.' then all_month=1; else all_month=0;
run;
*merge into narrow dataset;
proc sort data=test; by site; run;
proc sort data=test3; by site year; run;
data test4;
MERGE test test3;
BY site year;
drop _name_;
drop monthyr6-monthyr10;
RUN;
*delete obs w/o months 6-9;
proc sort data=test4; by site year; run;
data test5;
set test4;
if all_month=0 then delete;
run;
data have;
input ID Season $;
cards;
1 Winter
1 Winter
1 Spring
1 Summer
1 Summer
1 Fall
2 Winter
2 Winter
2 Spring
3 Winter
3 Fall
3 Fall
;
run;
proc sql;
create table want as
select *
from have
group by id
having count(distinct Season)=
(select count(distinct Season) from have);
quit;
great, thank you very much.
data step method (more stuff to write, might be faster with big data):
data int (keep=id);
set have;
by id;
retain Spring Summer Fall Winter;
if first.id then do;
Spring = 0; Summer = 0; Fall = 0; Winter = 0;
end;
select (Season);
when ('Spring') Spring = 1;
when ('Summer') Summer = 1;
when ('Fall') Fall = 1;
when ('Winter') Winter = 1;
end;
if last.id and sum(Spring,Summer,Fall,Winter) = 4 then output;
run;
data want;
merge
int (in=a)
have
;
by id;
if a;
run;
The SQL approach is neat, but requires that you really have clean data. A single non expected category value can affect the total result.
The data step approach includes hard coding values, which in most real world scenarios is not preferable/possible.
If you use a separate look-up table (with category master data) you could be sure to count only valid categories (one way is to us in a sub-query to a where in-clause).
Well, the hard-coding of values can be automated in a data step with call execute, so if that problem arises, it can be solved.
Thank you everyone for the suggestions. I forgot to include in my original question that i needed to keep one obs per year as well. So I needed one observation per year per month and needed to keep only the IDs that had values for months 6-9. This is the solution I found, after spending too much time on it!
data have;
input site year month;
cards;
1 97 6
1 97 6
1 97 7
1 98 8
1 99 8
1 97 9
2 97 7
2 97 7
2 97 8
3 98 8
3 99 9
3 99 9
.....;
run;
*keeping only 1 obs per month/yr;
proc sort data1=test nodupkey;
by site year month;
run;
*transpose by site and year, narrow to wide, w/ id month and var year;
proc transpose data=test1 out=test2 prefix=monthyr;
by site year;
id month;
var year;
run;
*make all_month var for only obs w/ months 6-9;
data test3;
set test2;
if monthyr6 ne '.' and monthyr7 ne '.' and monthyr8 ne '.' and monthyr9 ne '.' then all_month=1; else all_month=0;
run;
*merge into narrow dataset;
proc sort data=test; by site; run;
proc sort data=test3; by site year; run;
data test4;
MERGE test test3;
BY site year;
drop _name_;
drop monthyr6-monthyr10;
RUN;
*delete obs w/o months 6-9;
proc sort data=test4; by site year; run;
data test5;
set test4;
if all_month=0 then delete;
run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.