I want to create a data set with subjects who have records for all years between START_YR and END_YR and create a flag (1=Yes 0 = No).
Data set:
Subject start_yr end_yr yr
A 2015 2017 2015
A 2015 2017 2016
A 2015 2017 2017
B 2011 2016 2012
B 2011 2016 2014
B 2011 2016 2015
B 2011 2016 2016
Want:
Subject start_yr end_yr flag
A 2015 2017 1
B 2011 2016 0
Subject B is flag = 0 because they do not have a record for 2011 and 2013.
For given data and output, this code works. And might need modification depending on unspecified requirements;
data have;
input Subject $ start_yr end_yr yr;
cards;
A 2015 2017 2015
A 2015 2017 2016
A 2015 2017 2017
B 2011 2016 2012
B 2011 2016 2014
B 2011 2016 2015
B 2011 2016 2016
;
proc print; run;
data have1;
set have (drop=yr);
by subject;
do year=start_yr to end_yr;
output;
end;
run;
proc sort data=have1 noduprecs; by _all_ ; run;
data want;
merge have have1;
by subject start_yr end_yr;
retain flag;
if yr ne year then flag=0;
if flag eq . then flag=1;
if last.subject;
drop year;
proc print; run;
data have;
input Subject $ start_yr end_yr yr;
cards;
A 2015 2017 2015
A 2015 2017 2016
A 2015 2017 2017
B 2011 2016 2012
B 2011 2016 2014
B 2011 2016 2015
B 2011 2016 2016
;
proc sql;
create table want as
select Subject ,start_yr,end_yr,(start_yr=min(yr) and max(yr)=end_yr and count(distinct yr)=end_yr-start_yr+1) as flag
from have
group by Subject ,start_yr,end_yr;
quit;
You can do it with COUNT(DISTINCT YR).
data have;
input Subject $ start_yr end_yr yr;
cards;
A 2015 2017 2015
A 2015 2017 2016
A 2015 2017 2017
B 2011 2016 2012
B 2011 2016 2014
B 2011 2016 2015
B 2011 2016 2016
;
proc sql;
create table want as
select subject,start_yr,end_yr
, end_yr - start_yr + 1 as years
, count(*) as nobs
, count(distinct yr) as nyears
, count(distinct yr) = (end_yr-start_yr+1) as flag
from have
group by 1,2,3
;
quit;
Result
Obs Subject start_yr end_yr years nobs nyears flag 1 A 2015 2017 3 3 3 1 2 B 2011 2016 6 4 4 0
@Ksharp wrote:
Tom,
You didn't take int account of max and min of YR for start_yr and end_yr .
That was not part of the problem. The start and end years are part of the INPUT, not something that needed calculation.
data want;
set have;
by subject start_yr end_yr;
array f {2000:2099} _temporary_;
if first.end_yr
then do flag = 2000 to 2099;
f{flag} = 0;
end;
f{yr} = 1;
if last.end_yr;
flag = 1;
do yr = start_yr to end_yr;
if f{yr} = 0 then flag = 0;
end;
drop yr;
run;
The BY will also take care if there are more than one start_yr/end_yr combinations per subject.
Code is untested, posted from my tablet.
Some aspects need clarification:
- Are start_yr and end_yr constant for a subject?
- Is the combination of Subject and yr unique?
If the answer to both questions is yes, try:
data want;
set have(keep= Subject start_yr end_yr);
by Subject;
length n_start 8;
retain n_start;
if first.Subject then do;
n_start = _n_;
end;
if last.Subject then do;
flag = (_n_ - n_start) = end_yr - start_yr;
output;
end;
drop n_start;
run;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.