I want to create a data set with subjects who have records for all years between START_YR and END_YR and create a flag (1=Yes 0 = No).
Data set:
Subject start_yr end_yr yr
A 2015 2017 2015
A 2015 2017 2016
A 2015 2017 2017
B 2011 2016 2012
B 2011 2016 2014
B 2011 2016 2015
B 2011 2016 2016
Want:
Subject start_yr end_yr flag
A 2015 2017 1
B 2011 2016 0
Subject B is flag = 0 because they do not have a record for 2011 and 2013.
For given data and output, this code works. And might need modification depending on unspecified requirements;
data have;
input Subject $ start_yr end_yr yr;
cards;
A 2015 2017 2015
A 2015 2017 2016
A 2015 2017 2017
B 2011 2016 2012
B 2011 2016 2014
B 2011 2016 2015
B 2011 2016 2016
;
proc print; run;
data have1;
set have (drop=yr);
by subject;
do year=start_yr to end_yr;
output;
end;
run;
proc sort data=have1 noduprecs; by _all_ ; run;
data want;
merge have have1;
by subject start_yr end_yr;
retain flag;
if yr ne year then flag=0;
if flag eq . then flag=1;
if last.subject;
drop year;
proc print; run;
data have;
input Subject $ start_yr end_yr yr;
cards;
A 2015 2017 2015
A 2015 2017 2016
A 2015 2017 2017
B 2011 2016 2012
B 2011 2016 2014
B 2011 2016 2015
B 2011 2016 2016
;
proc sql;
create table want as
select Subject ,start_yr,end_yr,(start_yr=min(yr) and max(yr)=end_yr and count(distinct yr)=end_yr-start_yr+1) as flag
from have
group by Subject ,start_yr,end_yr;
quit;
You can do it with COUNT(DISTINCT YR).
data have;
input Subject $ start_yr end_yr yr;
cards;
A 2015 2017 2015
A 2015 2017 2016
A 2015 2017 2017
B 2011 2016 2012
B 2011 2016 2014
B 2011 2016 2015
B 2011 2016 2016
;
proc sql;
create table want as
select subject,start_yr,end_yr
, end_yr - start_yr + 1 as years
, count(*) as nobs
, count(distinct yr) as nyears
, count(distinct yr) = (end_yr-start_yr+1) as flag
from have
group by 1,2,3
;
quit;
Result
Obs Subject start_yr end_yr years nobs nyears flag 1 A 2015 2017 3 3 3 1 2 B 2011 2016 6 4 4 0
@Ksharp wrote:
Tom,
You didn't take int account of max and min of YR for start_yr and end_yr .
That was not part of the problem. The start and end years are part of the INPUT, not something that needed calculation.
data want;
set have;
by subject start_yr end_yr;
array f {2000:2099} _temporary_;
if first.end_yr
then do flag = 2000 to 2099;
f{flag} = 0;
end;
f{yr} = 1;
if last.end_yr;
flag = 1;
do yr = start_yr to end_yr;
if f{yr} = 0 then flag = 0;
end;
drop yr;
run;
The BY will also take care if there are more than one start_yr/end_yr combinations per subject.
Code is untested, posted from my tablet.
Some aspects need clarification:
- Are start_yr and end_yr constant for a subject?
- Is the combination of Subject and yr unique?
If the answer to both questions is yes, try:
data want;
set have(keep= Subject start_yr end_yr);
by Subject;
length n_start 8;
retain n_start;
if first.Subject then do;
n_start = _n_;
end;
if last.Subject then do;
flag = (_n_ - n_start) = end_yr - start_yr;
output;
end;
drop n_start;
run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.