Solved: Re: Selecting observations that appear in all years

RickyS · Posted 01-09-2019 09:56 AM

Good Morning did some google searching and searching through SAS Communities and did not see a solution for what I would like to do, I'm sure its out there just can't find it. Continuing my search but thought I'd post.

I have a data set with observations from 1999 - 2017, trying to figure out a simple way to select observations that appear in each and every year.

Cannot think of a way to translate a sub-setting statement ......id must appear in every year OR Id cannot be missing for any year.

Observations are unique by id and year but may not repeat for each year.

Sorry should ay that I know I could subset the file into 1 data set for every year and then rejoin based on year 1999 using id, but there must be a better way.

Thanks

andreas_lds · Posted 01-09-2019 10:14 AM

Maybe something like

proc sql;
   create table work.want as
   select *
      from work.have
         group by Id
            having count(*) = 2017-1999+1
   ;
quit;

View solution in original post

PeterClemmensen · Posted 01-09-2019 09:58 AM

There are several ways to do this.

Can you post some example data for us to provide a usable code answer?

The DATA to DATA Step Macro
Blog: SASnrd

RickyS · Posted 01-09-2019 10:11 AM

Id Year Name Score
1 1999 A 85
1 2000 A 85
1 2001 A 87
1 2002 A 85
1 2003 A 85
1 2004 A 85
1 2005 A 85
1 2006 A 85
1 2007 A 85
1 2008 A 90
1 2009 A 85
1 2010 A 85
1 2011 A 82
1 2012 A 85
1 2013 A 85
1 2014 A 85
1 2015 A 85
1 2016 A 85
1 2017 A 85
2 1999 B 85
2 2000 B 85
2 2002 B 85
2 2003 B 85
2 2004 B 85
2 2005 B 85
2 2006 B 85
2 2007 B 85
2 2008 B 90
2 2009 B 85
2 2010 B 85
2 2011 B 82
2 2012 B 85
2 2013 B 85
2 2014 B 85
2 2015 B 85
2 2016 B 85
2 2017 B 85

So we do not Id 2 because it had no score in 2001, if there is no score and no observation for a given year want to drop the observation

novinosrin · Posted 01-09-2019 10:19 AM

data have;
input Id Year Name $ Score;
cards;
1 1999 A 85
1 2000 A 85
1 2001 A 87
1 2002 A 85
1 2003 A 85
1 2004 A 85
1 2005 A 85
1 2006 A 85
1 2007 A 85
1 2008 A 90
1 2009 A 85
1 2010 A 85
1 2011 A 82
1 2012 A 85
1 2013 A 85
1 2014 A 85
1 2015 A 85
1 2016 A 85
1 2017 A 85
2 1999 B 85
2 2000 B 85
2 2002 B 85
2 2003 B 85
2 2004 B 85
2 2005 B 85
2 2006 B 85
2 2007 B 85
2 2008 B 90
2 2009 B 85
2 2010 B 85
2 2011 B 82
2 2012 B 85
2 2013 B 85
2 2014 B 85
2 2015 B 85
2 2016 B 85
2 2017 B 85
;

proc sql noprint;
select max(c) into :m 
from
(select count(id) as c from have group by id );
create table want as
select *
from have
group by id
having count(id)=&m;
quit;

PaigeMiller · Posted 01-09-2019 10:20 AM

UNTESTED CODE

proc summary data=have;
    class id;
    var year;
    output out=yearstats min=minyear max=maxyear n=nyear;
run;
data want;
    if _n_=1 then set yearstats(where=(_type_=0) keep=minyear maxyear _type_);
    merge have yearstats(where=(_type_=1) keep=nyear _type_);
    by id;
    if nyear<(maxyear-minyear+1) then delete;
    drop _type_;
run;

Hashtag: #PROCSUMMARYRULEZ

--
Paige Miller

andreas_lds · Posted 01-09-2019 10:14 AM

Maybe something like

proc sql;
   create table work.want as
   select *
      from work.have
         group by Id
            having count(*) = 2017-1999+1
   ;
quit;

RickyS · Posted 01-09-2019 10:39 AM

Thank you everyone as I looked at the solutions I think they would all work but this one seemed the most simplistic, and easiest to validate. Did not think o using a having statement this way.

cheers

PaigeMiller · Posted 01-09-2019 10:45 AM

Just remember to change the solution from @andreas_lds if newer data contains 2018 or later records, or if at some point data from 1999 is removed from the analysis, etc. Or use a more general approach that figures out the max and min years.

--
Paige Miller

PeterClemmensen · Posted 01-09-2019 10:58 AM

data have;
input Id $ Year Name $ Score;
datalines;
1 1999 A 85
1 2000 A 85
1 2001 A 87
1 2002 A 85
1 2003 A 85
1 2004 A 85
1 2005 A 85
1 2006 A 85
1 2007 A 85
1 2008 A 90
1 2009 A 85
1 2010 A 85
1 2011 A 82
1 2012 A 85
1 2013 A 85
1 2014 A 85
1 2015 A 85
1 2016 A 85
1 2017 A 85
2 1999 B 85
2 2000 B 85
2 2002 B 85
2 2003 B 85
2 2004 B 85
2 2005 B 85
2 2006 B 85
2 2007 B 85
2 2008 B 90
2 2009 B 85
2 2010 B 85
2 2011 B 82
2 2012 B 85
2 2013 B 85
2 2014 B 85
2 2015 B 85
2 2016 B 85
2 2017 B 85
;

data _null_;
   if _N_ = 1 then do;
      declare hash h(dataset:'have');
      h.defineKey('Id', 'Year');
      h.defineDone();    

      declare hash out(dataset:'have', multidata:'Y');
      out.defineKey('Id');
      out.definedata(all:'Y');
      out.defineDone();
   end;

   set have end=eof;
   by ID;

   if h.check()=0 then c+1;

   if last.id then do;
      if c ne (2017-1999+1) then out.remove(key:Id);
      c=0;
   end;

   if eof then out.output(dataset:'want');
run;

The DATA to DATA Step Macro
Blog: SASnrd

Ready to join fellow brilliant minds for the SAS Hackathon?

Classroom Training Available!