I have data below
data have;
input id type visit_date :yymmdd10.;
format visit_date yymmdd10.;
datalines;
1 1 2007-01-01
1 2 2008-01-02
1 2 2009-01-03
1 3 2010-01-04
1 5 2011-01-05
2 1 2008-01-01
2 2 2008-01-02
2 2 2009-01-03
2 3 2010-01-04
2 5 2011-01-05
3 2 2007-01-09
3 . 2008-01-10
3 2 2009-01-11
3 . 2010-01-12
4 . 2007-01-10
4 3 2008-01-11
4 1 2009-01-12
5 2 2007-01-11
5 2 2008-01-12
5 2 2009-01-10
5 2 2010-01-11
6 2 2008-01-12
6 1 2009-01-10
6 . 2010-01-11
;
proc sort data=have; by id visit_date ;run;
I want to select rows for the id, whose id have non-missing "type" for each "visit_year" of 2007-2011.
So, for the dataset above, I would expect I will get row 1-5.
To re-formulate your requirement:
I want to select rows for the id, whose id have NEVER a missing "type" for "visit_year" between 2007 to 2011 AND for which there is at least one row per year.
I assume there could also be multiple visit dates (rows) for an ID in a single year.
Below code basically using what @Kurt_Bremser proposed with some tweaks to check the type variable and ensuring that there can't be an out of array subscript condition. I've also amended your sample data slightly to make it a bit more representative for how I assume your real data looks like (=multiple rows per year for an id - of which one row could have a missing type).
data have;
infile datalines dsd truncover;
input id type visit_date :yymmdd10.;
format visit_date yymmdd10.;
datalines;
1,1,2007-01-01
1,1,2007-05-01
1,2,2008-01-02
1,2,2009-01-03
1,3,2010-01-04
1,5,2011-01-05
9,1,2007-01-01
9,1,2007-05-01
9,.,2007-06-01
9,2,2008-01-02
9,2,2009-01-03
9,3,2010-01-04
9,5,2011-01-05
2,1,2008-01-01
2,2,2008-01-02
2,2,2009-01-03
2,3,2010-01-04
2,5,2011-01-05
2,5,2012-01-05
3,2,2007-01-09
3,.,2008-01-10
3,2,2009-01-11
3,.,2010-01-12
3,.,2011-01-12
4,.,2007-01-10
4,3,2008-01-11
4,1,2009-01-12
5,2,2007-01-11
5,2,2008-01-12
5,2,2009-01-10
5,2,2010-01-11
6,2,2008-01-12
6,1,2009-01-10
6,.,2010-01-11
;
proc sort data=have;
by id;
run;
data want;
array _y{2007:2011};
call missing(of _y[*]);
do until (last.id);
set have;
by id;
if 2007<=year(visit_date)<=2011 then
do;
if missing(type) then _y{year(visit_date)} = 0;
else _y{year(visit_date)} = _y{year(visit_date)} ne 0;
end;
end;
do until (last.id);
set have;
by id;
if sum(of _y:) = 5 then output;
end;
drop _y:;
run;
proc print data=want;
run;
If you need to see only rows 1-5 in the output, please explain further showing the expected output data as well.
What I see from your data is each visit (visit_date) is unique per id. If you select only non missing TYPE (where TYPE ^=. ) then you get more observation than 5.
Assuming that your data, when 2007 and 2011 years present, will have all years of 2007 -2011 per ID:
proc sql;
create table want as
select*
from have
where type ne . and id in (select id from have
group by id having min(year(visit_date)) eq 2007 and max(year(visit_date)) eq 2011);
quit;
data want;
array _y{2007:2011};
do until (last.id);
set have;
by id;
_y{year(visit_date)} = 1;
end;
do until (last.id);
set have;
by id;
if sum(of _y:) = 5 then output;
end;
drop _y:;
run;
data have;
input id type visit_date :yymmdd10.;
format visit_date yymmdd10.;
datalines;
1 1 2007-01-01
1 2 2008-01-02
1 2 2009-01-03
1 3 2010-01-04
1 5 2011-01-05
2 1 2008-01-01
2 2 2008-01-02
2 2 2009-01-03
2 3 2010-01-04
2 5 2011-01-05
3 2 2007-01-09
3 . 2008-01-10
3 2 2009-01-11
3 . 2010-01-12
4 . 2007-01-10
4 3 2008-01-11
4 1 2009-01-12
5 2 2007-01-11
5 2 2008-01-12
5 2 2009-01-10
5 2 2010-01-11
6 2 2008-01-12
6 1 2009-01-10
6 . 2010-01-11
;
proc sql;
create table want as
select *
from have
where type is not missing and year(visit_date) between 2007 and 2011
group by id
having count(distinct year(visit_date))=5;
quit;
This code reads each ID twice, as in @Kurt_Bremser 's example. The first pass reads only non-missing TYPE values and builds a profile of years encountered. But the subsetting IF allows output only during second pass, if the profile is all 1's.
data have;
input id type visit_date :yymmdd10.;
format visit_date yymmdd10.;
datalines;
1 1 2007-01-01
1 2 2008-01-02
1 2 2009-01-03
1 3 2010-01-04
1 5 2011-01-05
2 1 2008-01-01
2 2 2008-01-02
2 2 2009-01-03
2 3 2010-01-04
2 5 2011-01-05
3 2 2007-01-09
3 . 2008-01-10
3 2 2009-01-11
3 . 2010-01-12
4 . 2007-01-10
4 3 2008-01-11
4 1 2009-01-12
5 2 2007-01-11
5 2 2008-01-12
5 2 2009-01-10
5 2 2010-01-11
6 2 2008-01-12
6 1 2009-01-10
6 . 2010-01-11
6 9 2013-01-11
run;
data want;
set have (where=(missing(type)=0) in=firstpass)
have (in=secondpass);
by id;
array years {2007:2011} _temporary_;
if first.id then call missing(of years{*});
if firstpass then years{year(visit_date)}=1;
if secondpass=1 and sum(of years{*})=5;
run;
The code assumes that all the data are in the desired years (2007-2011). If earlier or later visit_dates are possible then change the "where" option for the firstpass as in:
set have (where=(missing(type)=0 and '01jan2007'd<=visit_date<='31dec2011'd) in=firstpass)
have (in=secondpass);
To re-formulate your requirement:
I want to select rows for the id, whose id have NEVER a missing "type" for "visit_year" between 2007 to 2011 AND for which there is at least one row per year.
I assume there could also be multiple visit dates (rows) for an ID in a single year.
Below code basically using what @Kurt_Bremser proposed with some tweaks to check the type variable and ensuring that there can't be an out of array subscript condition. I've also amended your sample data slightly to make it a bit more representative for how I assume your real data looks like (=multiple rows per year for an id - of which one row could have a missing type).
data have;
infile datalines dsd truncover;
input id type visit_date :yymmdd10.;
format visit_date yymmdd10.;
datalines;
1,1,2007-01-01
1,1,2007-05-01
1,2,2008-01-02
1,2,2009-01-03
1,3,2010-01-04
1,5,2011-01-05
9,1,2007-01-01
9,1,2007-05-01
9,.,2007-06-01
9,2,2008-01-02
9,2,2009-01-03
9,3,2010-01-04
9,5,2011-01-05
2,1,2008-01-01
2,2,2008-01-02
2,2,2009-01-03
2,3,2010-01-04
2,5,2011-01-05
2,5,2012-01-05
3,2,2007-01-09
3,.,2008-01-10
3,2,2009-01-11
3,.,2010-01-12
3,.,2011-01-12
4,.,2007-01-10
4,3,2008-01-11
4,1,2009-01-12
5,2,2007-01-11
5,2,2008-01-12
5,2,2009-01-10
5,2,2010-01-11
6,2,2008-01-12
6,1,2009-01-10
6,.,2010-01-11
;
proc sort data=have;
by id;
run;
data want;
array _y{2007:2011};
call missing(of _y[*]);
do until (last.id);
set have;
by id;
if 2007<=year(visit_date)<=2011 then
do;
if missing(type) then _y{year(visit_date)} = 0;
else _y{year(visit_date)} = _y{year(visit_date)} ne 0;
end;
end;
do until (last.id);
set have;
by id;
if sum(of _y:) = 5 then output;
end;
drop _y:;
run;
proc print data=want;
run;
Hi Patrick,
Indeed, you have a valid point! I didn't explain myself clearly before I asked this question, but you made me rethink what kind of data I really have.
May I ask a follow-up question? What if I want the year range to be the first year of visit to 4 years after the first year of visit, instead of a fixed range 2007-2011? For example, for id1 the year range I want to look at should be 2007-2011, id2 2008-2012, id3 2007-2011, id4 2007-2011 etc.
Code like below should work. I've now also added parameter &n_years (%let n_years=5
) which lets you select in a single place how many years you want to include into the selection window.
proc sort data=have;
by id visit_date;
run;
%let n_years=5;
data want;
array _y{&n_years};
call missing(of _y[*]);
do until (last.id);
set have;
by id visit_date;
visit_year=year(visit_date);
if first.id then _first_year=visit_year;
_ind=visit_year - _first_year +1;
if 1<=_ind<=&n_years then
do;
if missing(type) then _y{_ind} = 0;
else _y{_ind} = _y{_ind} ne 0;
end;
end;
do until (last.id);
set have;
by id visit_date;
if sum(of _y:) = &n_years then output;
end;
drop _:;
run;
proc print data=want;
run;
thanks! this is magic 🙂
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.