BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jlajla
Obsidian | Level 7

I'd like to keep only observations that include all values of a category.

 

I have this:

 

ID    Season

1      Winter

1      Winter

1      Spring

1      Summer

1      Summer

1      Fall

2      Winter

2       Winter

2      Spring

3      Winter

3      Fall

3       Fall

 

And I'd like to keep only ID 1 b/c it includes all categories for the season var, and I'd like to delete IDs 2 & 3 because they do not include all 4 categories for the season var.  Also, some IDs have have repeat observation for the same season category.

 

Does anyone have a suggestion for this?  I'm using SAS Studio. Thank you in advance.

1 ACCEPTED SOLUTION

Accepted Solutions
jlajla
Obsidian | Level 7

Thank you everyone for the suggestions. I forgot to include in my original question that i needed to keep one obs per year as well. So I needed one observation per year per month and needed to keep only the IDs that had values for months 6-9.  This is the solution I found, after spending too much time on it!

 

 

data have;
input site year month;
cards;
1   97   6
1   97  6
1   97  7
1   98   8
1   99  8
1    97  9
2    97  7
2    97  7
2    97  8
3   98 8
3   99 9
3   99 9
.....;
run;

 

*keeping only 1 obs per month/yr;
proc sort data1=test nodupkey;
by site year month;
run;

*transpose by site and year, narrow to wide, w/ id month and var year;
proc transpose data=test1 out=test2 prefix=monthyr;
by site year;
id month;
var year;
run;

 

*make all_month var for only obs w/ months 6-9;
data test3;
set test2;
if monthyr6 ne '.' and monthyr7 ne '.' and monthyr8 ne '.' and monthyr9 ne '.' then all_month=1; else all_month=0;
run;

 

*merge into narrow dataset;
proc sort data=test; by site; run;
proc sort data=test3; by site year; run;
data test4;
MERGE test test3;
BY site year;
drop _name_;
drop monthyr6-monthyr10;
RUN;

 

*delete obs w/o months 6-9;
proc sort data=test4; by site year; run;
data test5;
set test4;
if all_month=0 then delete;
run;

View solution in original post

6 REPLIES 6
Ksharp
Super User
data have;
input ID    Season $;
cards;
1      Winter
1      Winter
1      Spring
1      Summer
1      Summer
1      Fall
2      Winter
2       Winter
2      Spring
3      Winter
3      Fall
3       Fall
;
run;
proc sql;
 create table want as 
  select *
   from have
    group by id
     having count(distinct Season)=
      (select count(distinct Season) from have);
quit;
jlajla
Obsidian | Level 7

great, thank you very much.

Kurt_Bremser
Super User

data step method (more stuff to write, might be faster with big data):

data int (keep=id);
set have;
by id;
retain Spring Summer Fall Winter;
if first.id then do;
  Spring = 0; Summer = 0; Fall = 0; Winter = 0;
end;
select (Season);
  when ('Spring') Spring = 1;
  when ('Summer') Summer = 1;
  when ('Fall') Fall = 1;
  when ('Winter') Winter = 1;
end;
if last.id and sum(Spring,Summer,Fall,Winter) = 4 then output;
run;

data want;
merge
  int (in=a)
  have
;
by id;
if a;
run;

LinusH
Tourmaline | Level 20

The SQL approach is neat, but requires that you really have clean data. A single non expected category value can affect the total result.

 

The data step approach includes hard coding values, which in most real world scenarios is not preferable/possible.

 

If you use a separate look-up table (with category master data) you could be sure to count only valid categories (one way is to us in a sub-query to a where in-clause).

Data never sleeps
jlajla
Obsidian | Level 7

Thank you everyone for the suggestions. I forgot to include in my original question that i needed to keep one obs per year as well. So I needed one observation per year per month and needed to keep only the IDs that had values for months 6-9.  This is the solution I found, after spending too much time on it!

 

 

data have;
input site year month;
cards;
1   97   6
1   97  6
1   97  7
1   98   8
1   99  8
1    97  9
2    97  7
2    97  7
2    97  8
3   98 8
3   99 9
3   99 9
.....;
run;

 

*keeping only 1 obs per month/yr;
proc sort data1=test nodupkey;
by site year month;
run;

*transpose by site and year, narrow to wide, w/ id month and var year;
proc transpose data=test1 out=test2 prefix=monthyr;
by site year;
id month;
var year;
run;

 

*make all_month var for only obs w/ months 6-9;
data test3;
set test2;
if monthyr6 ne '.' and monthyr7 ne '.' and monthyr8 ne '.' and monthyr9 ne '.' then all_month=1; else all_month=0;
run;

 

*merge into narrow dataset;
proc sort data=test; by site; run;
proc sort data=test3; by site year; run;
data test4;
MERGE test test3;
BY site year;
drop _name_;
drop monthyr6-monthyr10;
RUN;

 

*delete obs w/o months 6-9;
proc sort data=test4; by site year; run;
data test5;
set test4;
if all_month=0 then delete;
run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 1029 views
  • 3 likes
  • 4 in conversation