BookmarkSubscribeRSS Feed
joreoh
Calcite | Level 5

I want to create a data set with subjects who have records for all years between START_YR and END_YR and create a flag (1=Yes 0 = No).

Data set:

Subject     start_yr            end_yr    yr

A                 2015                 2017        2015

A                 2015                 2017        2016   

A                 2015                 2017        2017

B                 2011                  2016        2012 

B                 2011                  2016        2014 

B                 2011                  2016        2015 

B                 2011                  2016        2016

 

 

Want:

Subject     start_yr            end_yr   flag

A                 2015                 2017       1

B                 2011                  2016       0

 

Subject B is flag = 0 because they do not have a record for 2011 and 2013.

 

 

 

7 REPLIES 7
A_Kh
Lapis Lazuli | Level 10

For given data and output, this code works. And might need modification depending on unspecified requirements; 

data have;
input Subject $ start_yr end_yr  yr;
cards; 
A  2015  2017  2015
A  2015  2017  2016
A  2015  2017  2017
B  2011  2016  2012
B  2011  2016  2014
B  2011  2016  2015
B  2011  2016  2016
;
proc print; run; 

data have1;
	set have (drop=yr);
	by subject;
	do year=start_yr to end_yr;
	output;
	end; 
run;  

proc sort data=have1 noduprecs; by _all_ ; run; 

data want;
	merge have have1;
	by subject start_yr end_yr;
	retain flag;
	if yr ne year then flag=0;
	if flag eq . then flag=1; 
	if last.subject;
drop year; proc print; run;



Ksharp
Super User
data have;
input Subject  $   start_yr            end_yr    yr;
cards;
A                 2015                 2017        2015
A                 2015                 2017        2016   
A                 2015                 2017        2017
B                 2011                  2016        2012 
B                 2011                  2016        2014 
B                 2011                  2016        2015 
B                 2011                  2016        2016
;

proc sql;
create table want as
select Subject ,start_yr,end_yr,(start_yr=min(yr) and max(yr)=end_yr and count(distinct yr)=end_yr-start_yr+1) as flag
 from have
  group by Subject ,start_yr,end_yr;
quit;
Tom
Super User Tom
Super User

You can do it with COUNT(DISTINCT YR).

data have;
  input Subject $ start_yr end_yr yr;
cards; 
A  2015  2017  2015
A  2015  2017  2016
A  2015  2017  2017
B  2011  2016  2012
B  2011  2016  2014
B  2011  2016  2015
B  2011  2016  2016
;

proc sql;
  create table want as
  select subject,start_yr,end_yr
       , end_yr - start_yr + 1 as years
       , count(*) as nobs
       , count(distinct yr) as nyears
       , count(distinct yr) = (end_yr-start_yr+1) as flag
  from have
  group by 1,2,3
  ;
quit;

Result

Obs    Subject    start_yr    end_yr    years    nobs    nyears    flag

 1        A         2015       2017       3        3        3        1
 2        B         2011       2016       6        4        4        0

Ksharp
Super User
Tom,
You didn't take int account of max and min of YR for start_yr and end_yr .
Tom
Super User Tom
Super User

@Ksharp wrote:
Tom,
You didn't take int account of max and min of YR for start_yr and end_yr .

That was not part of the problem.  The start and end years are part of the INPUT, not something that needed calculation.

 

Kurt_Bremser
Super User
data want;
set have;
by subject start_yr end_yr;
array f {2000:2099} _temporary_;
if first.end_yr
then do flag = 2000 to 2099;
  f{flag} = 0;
end;
f{yr} = 1;
if last.end_yr;
flag = 1;
do yr = start_yr to end_yr;
  if f{yr} = 0 then flag = 0;
end;
drop yr;
run;

The BY will also take care if there are more than one start_yr/end_yr combinations per subject.

Code is untested, posted from my tablet.

andreas_lds
Jade | Level 19

Some aspects need clarification:

- Are start_yr and end_yr constant for a subject?

- Is the combination of Subject and yr unique?

 

If the answer to both questions is yes, try:

data want;
   set have(keep= Subject start_yr end_yr);
   by Subject;

   length n_start 8;
   retain n_start;

   if first.Subject then do;
      n_start = _n_;
   end;

   if last.Subject then do;
      flag = (_n_ - n_start) = end_yr - start_yr;
      output;
   end;

   drop n_start;
run;

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 2244 views
  • 0 likes
  • 6 in conversation