Good Morning did some google searching and searching through SAS Communities and did not see a solution for what I would like to do, I'm sure its out there just can't find it. Continuing my search but thought I'd post.
I have a data set with observations from 1999 - 2017, trying to figure out a simple way to select observations that appear in each and every year.
Cannot think of a way to translate a sub-setting statement ......id must appear in every year OR Id cannot be missing for any year.
Observations are unique by id and year but may not repeat for each year.
Sorry should ay that I know I could subset the file into 1 data set for every year and then rejoin based on year 1999 using id, but there must be a better way.
Thanks
Maybe something like
proc sql;
create table work.want as
select *
from work.have
group by Id
having count(*) = 2017-1999+1
;
quit;
There are several ways to do this.
Can you post some example data for us to provide a usable code answer?
data have;
input Id Year Name $ Score;
cards;
1 1999 A 85
1 2000 A 85
1 2001 A 87
1 2002 A 85
1 2003 A 85
1 2004 A 85
1 2005 A 85
1 2006 A 85
1 2007 A 85
1 2008 A 90
1 2009 A 85
1 2010 A 85
1 2011 A 82
1 2012 A 85
1 2013 A 85
1 2014 A 85
1 2015 A 85
1 2016 A 85
1 2017 A 85
2 1999 B 85
2 2000 B 85
2 2002 B 85
2 2003 B 85
2 2004 B 85
2 2005 B 85
2 2006 B 85
2 2007 B 85
2 2008 B 90
2 2009 B 85
2 2010 B 85
2 2011 B 82
2 2012 B 85
2 2013 B 85
2 2014 B 85
2 2015 B 85
2 2016 B 85
2 2017 B 85
;
proc sql noprint;
select max(c) into :m
from
(select count(id) as c from have group by id );
create table want as
select *
from have
group by id
having count(id)=&m;
quit;
UNTESTED CODE
proc summary data=have;
class id;
var year;
output out=yearstats min=minyear max=maxyear n=nyear;
run;
data want;
if _n_=1 then set yearstats(where=(_type_=0) keep=minyear maxyear _type_);
merge have yearstats(where=(_type_=1) keep=nyear _type_);
by id;
if nyear<(maxyear-minyear+1) then delete;
drop _type_;
run;
Hashtag: #PROCSUMMARYRULEZ
Maybe something like
proc sql;
create table work.want as
select *
from work.have
group by Id
having count(*) = 2017-1999+1
;
quit;
Thank you everyone as I looked at the solutions I think they would all work but this one seemed the most simplistic, and easiest to validate. Did not think o using a having statement this way.
cheers
Just remember to change the solution from @andreas_lds if newer data contains 2018 or later records, or if at some point data from 1999 is removed from the analysis, etc. Or use a more general approach that figures out the max and min years.
data have;
input Id $ Year Name $ Score;
datalines;
1 1999 A 85
1 2000 A 85
1 2001 A 87
1 2002 A 85
1 2003 A 85
1 2004 A 85
1 2005 A 85
1 2006 A 85
1 2007 A 85
1 2008 A 90
1 2009 A 85
1 2010 A 85
1 2011 A 82
1 2012 A 85
1 2013 A 85
1 2014 A 85
1 2015 A 85
1 2016 A 85
1 2017 A 85
2 1999 B 85
2 2000 B 85
2 2002 B 85
2 2003 B 85
2 2004 B 85
2 2005 B 85
2 2006 B 85
2 2007 B 85
2 2008 B 90
2 2009 B 85
2 2010 B 85
2 2011 B 82
2 2012 B 85
2 2013 B 85
2 2014 B 85
2 2015 B 85
2 2016 B 85
2 2017 B 85
;
data _null_;
if _N_ = 1 then do;
declare hash h(dataset:'have');
h.defineKey('Id', 'Year');
h.defineDone();
declare hash out(dataset:'have', multidata:'Y');
out.defineKey('Id');
out.definedata(all:'Y');
out.defineDone();
end;
set have end=eof;
by ID;
if h.check()=0 then c+1;
if last.id then do;
if c ne (2017-1999+1) then out.remove(key:Id);
c=0;
end;
if eof then out.output(dataset:'want');
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.