BookmarkSubscribeRSS Feed
joreoh
Calcite | Level 5

I want to create a data set with subjects who have records for all years between START_YR and END_YR and create a flag (1=Yes 0 = No).

Data set:

Subject     start_yr            end_yr    yr

A                 2015                 2017        2015

A                 2015                 2017        2016   

A                 2015                 2017        2017

B                 2011                  2016        2012 

B                 2011                  2016        2014 

B                 2011                  2016        2015 

B                 2011                  2016        2016

 

 

Want:

Subject     start_yr            end_yr   flag

A                 2015                 2017       1

B                 2011                  2016       0

 

Subject B is flag = 0 because they do not have a record for 2011 and 2013.

 

 

 

7 REPLIES 7
A_Kh
Barite | Level 11

For given data and output, this code works. And might need modification depending on unspecified requirements; 

data have;
input Subject $ start_yr end_yr  yr;
cards; 
A  2015  2017  2015
A  2015  2017  2016
A  2015  2017  2017
B  2011  2016  2012
B  2011  2016  2014
B  2011  2016  2015
B  2011  2016  2016
;
proc print; run; 

data have1;
	set have (drop=yr);
	by subject;
	do year=start_yr to end_yr;
	output;
	end; 
run;  

proc sort data=have1 noduprecs; by _all_ ; run; 

data want;
	merge have have1;
	by subject start_yr end_yr;
	retain flag;
	if yr ne year then flag=0;
	if flag eq . then flag=1; 
	if last.subject;
drop year; proc print; run;



Ksharp
Super User
data have;
input Subject  $   start_yr            end_yr    yr;
cards;
A                 2015                 2017        2015
A                 2015                 2017        2016   
A                 2015                 2017        2017
B                 2011                  2016        2012 
B                 2011                  2016        2014 
B                 2011                  2016        2015 
B                 2011                  2016        2016
;

proc sql;
create table want as
select Subject ,start_yr,end_yr,(start_yr=min(yr) and max(yr)=end_yr and count(distinct yr)=end_yr-start_yr+1) as flag
 from have
  group by Subject ,start_yr,end_yr;
quit;
Tom
Super User Tom
Super User

You can do it with COUNT(DISTINCT YR).

data have;
  input Subject $ start_yr end_yr yr;
cards; 
A  2015  2017  2015
A  2015  2017  2016
A  2015  2017  2017
B  2011  2016  2012
B  2011  2016  2014
B  2011  2016  2015
B  2011  2016  2016
;

proc sql;
  create table want as
  select subject,start_yr,end_yr
       , end_yr - start_yr + 1 as years
       , count(*) as nobs
       , count(distinct yr) as nyears
       , count(distinct yr) = (end_yr-start_yr+1) as flag
  from have
  group by 1,2,3
  ;
quit;

Result

Obs    Subject    start_yr    end_yr    years    nobs    nyears    flag

 1        A         2015       2017       3        3        3        1
 2        B         2011       2016       6        4        4        0

Ksharp
Super User
Tom,
You didn't take int account of max and min of YR for start_yr and end_yr .
Tom
Super User Tom
Super User

@Ksharp wrote:
Tom,
You didn't take int account of max and min of YR for start_yr and end_yr .

That was not part of the problem.  The start and end years are part of the INPUT, not something that needed calculation.

 

Kurt_Bremser
Super User
data want;
set have;
by subject start_yr end_yr;
array f {2000:2099} _temporary_;
if first.end_yr
then do flag = 2000 to 2099;
  f{flag} = 0;
end;
f{yr} = 1;
if last.end_yr;
flag = 1;
do yr = start_yr to end_yr;
  if f{yr} = 0 then flag = 0;
end;
drop yr;
run;

The BY will also take care if there are more than one start_yr/end_yr combinations per subject.

Code is untested, posted from my tablet.

andreas_lds
Jade | Level 19

Some aspects need clarification:

- Are start_yr and end_yr constant for a subject?

- Is the combination of Subject and yr unique?

 

If the answer to both questions is yes, try:

data want;
   set have(keep= Subject start_yr end_yr);
   by Subject;

   length n_start 8;
   retain n_start;

   if first.Subject then do;
      n_start = _n_;
   end;

   if last.Subject then do;
      flag = (_n_ - n_start) = end_yr - start_yr;
      output;
   end;

   drop n_start;
run;

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 2802 views
  • 0 likes
  • 6 in conversation