SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

How to keep only obs with all values of a category

Accepted Solution Solved
Reply
Contributor
Posts: 23
Accepted Solution

How to keep only obs with all values of a category

[ Edited ]

I'd like to keep only observations that include all values of a category.

 

I have this:

 

ID    Season

1      Winter

1      Winter

1      Spring

1      Summer

1      Summer

1      Fall

2      Winter

2       Winter

2      Spring

3      Winter

3      Fall

3       Fall

 

And I'd like to keep only ID 1 b/c it includes all categories for the season var, and I'd like to delete IDs 2 & 3 because they do not include all 4 categories for the season var.  Also, some IDs have have repeat observation for the same season category.

 

Does anyone have a suggestion for this?  I'm using SAS Studio. Thank you in advance.


Accepted Solutions
Solution
‎02-03-2016 06:48 PM
Contributor
Posts: 23

Re: How to keep only obs with all values of a category

Thank you everyone for the suggestions. I forgot to include in my original question that i needed to keep one obs per year as well. So I needed one observation per year per month and needed to keep only the IDs that had values for months 6-9.  This is the solution I found, after spending too much time on it!

 

 

data have;
input site year month;
cards;
1   97   6
1   97  6
1   97  7
1   98   8
1   99  8
1    97  9
2    97  7
2    97  7
2    97  8
3   98 8
3   99 9
3   99 9
.....;
run;

 

*keeping only 1 obs per month/yr;
proc sort data1=test nodupkey;
by site year month;
run;

*transpose by site and year, narrow to wide, w/ id month and var year;
proc transpose data=test1 out=test2 prefix=monthyr;
by site year;
id month;
var year;
run;

 

*make all_month var for only obs w/ months 6-9;
data test3;
set test2;
if monthyr6 ne '.' and monthyr7 ne '.' and monthyr8 ne '.' and monthyr9 ne '.' then all_month=1; else all_month=0;
run;

 

*merge into narrow dataset;
proc sort data=test; by site; run;
proc sort data=test3; by site year; run;
data test4;
MERGE test test3;
BY site year;
drop _name_;
drop monthyr6-monthyr10;
RUN;

 

*delete obs w/o months 6-9;
proc sort data=test4; by site year; run;
data test5;
set test4;
if all_month=0 then delete;
run;

View solution in original post


All Replies
Super User
Posts: 10,041

Re: How to keep only obs with all values of a category

data have;
input ID    Season $;
cards;
1      Winter
1      Winter
1      Spring
1      Summer
1      Summer
1      Fall
2      Winter
2       Winter
2      Spring
3      Winter
3      Fall
3       Fall
;
run;
proc sql;
 create table want as 
  select *
   from have
    group by id
     having count(distinct Season)=
      (select count(distinct Season) from have);
quit;
Contributor
Posts: 23

Re: How to keep only obs with all values of a category

great, thank you very much.

Super User
Posts: 7,831

Re: How to keep only obs with all values of a category

data step method (more stuff to write, might be faster with big data):

data int (keep=id);
set have;
by id;
retain Spring Summer Fall Winter;
if first.id then do;
  Spring = 0; Summer = 0; Fall = 0; Winter = 0;
end;
select (Season);
  when ('Spring') Spring = 1;
  when ('Summer') Summer = 1;
  when ('Fall') Fall = 1;
  when ('Winter') Winter = 1;
end;
if last.id and sum(Spring,Summer,Fall,Winter) = 4 then output;
run;

data want;
merge
  int (in=a)
  have
;
by id;
if a;
run;

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Super User
Posts: 5,432

Re: How to keep only obs with all values of a category

Posted in reply to KurtBremser

The SQL approach is neat, but requires that you really have clean data. A single non expected category value can affect the total result.

 

The data step approach includes hard coding values, which in most real world scenarios is not preferable/possible.

 

If you use a separate look-up table (with category master data) you could be sure to count only valid categories (one way is to us in a sub-query to a where in-clause).

Data never sleeps
Super User
Posts: 7,831

Re: How to keep only obs with all values of a category

Well, the hard-coding of values can be automated in a data step with call execute, so if that problem arises, it can be solved.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Solution
‎02-03-2016 06:48 PM
Contributor
Posts: 23

Re: How to keep only obs with all values of a category

Thank you everyone for the suggestions. I forgot to include in my original question that i needed to keep one obs per year as well. So I needed one observation per year per month and needed to keep only the IDs that had values for months 6-9.  This is the solution I found, after spending too much time on it!

 

 

data have;
input site year month;
cards;
1   97   6
1   97  6
1   97  7
1   98   8
1   99  8
1    97  9
2    97  7
2    97  7
2    97  8
3   98 8
3   99 9
3   99 9
.....;
run;

 

*keeping only 1 obs per month/yr;
proc sort data1=test nodupkey;
by site year month;
run;

*transpose by site and year, narrow to wide, w/ id month and var year;
proc transpose data=test1 out=test2 prefix=monthyr;
by site year;
id month;
var year;
run;

 

*make all_month var for only obs w/ months 6-9;
data test3;
set test2;
if monthyr6 ne '.' and monthyr7 ne '.' and monthyr8 ne '.' and monthyr9 ne '.' then all_month=1; else all_month=0;
run;

 

*merge into narrow dataset;
proc sort data=test; by site; run;
proc sort data=test3; by site year; run;
data test4;
MERGE test test3;
BY site year;
drop _name_;
drop monthyr6-monthyr10;
RUN;

 

*delete obs w/o months 6-9;
proc sort data=test4; by site year; run;
data test5;
set test4;
if all_month=0 then delete;
run;

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 391 views
  • 3 likes
  • 4 in conversation