BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
miguelito
Fluorite | Level 6

 

Hi Everyone,

 

As illustration purposes I have a dataset with the following two SAS year variables:

 

id     start_year     end_year              

1      1991            1992          

1      1992            1993          

1      1993            1993     

1      1993            1993

1      1993            1993     

1      1994            1998

1      1998            2015

2      1993            1994     

3      1985            1986      

3      1990            1991

3      1992            2000

 

id=1 starts in 1991 and ends in 2015 with no gaps; same for id=2. But id=3 has a gap in 1987, 1988, and 1989

 

First, how can I track and know if there is a gap or no by id?

 

Second, how can I add the years gap by id to my data? For example,

 

id     start_year     end_year              

1      1991            1992          

1      1992            1993          

1      1993            1993     

1      1993            1993

1      1993            1993     

1      1994            1998

1      1998            2015

2      1993            1994     

3      1985            1986

3      1987            1987     

3      1988            1988

3      1989            1989

3      1990            1991

3      1992            2000

 

Thank you very much

1 ACCEPTED SOLUTION

Accepted Solutions
DanielSantos
Barite | Level 11

Hi.

 

Assuming data is previously ordered by id and start_year and there are no overlaping periods, here's another way:

 

data want;
set have;
drop _:;

* auxiliary vars;
_end_year=lag1(end_year);
_id=lag1(id);
output; * output row;

* cycle and fill the gap, if needed;
if _id eq id and start_year-1 ne _end_year then
do end_year=_end_year+1 to start_year-1;
start_year=end_year;
output;
put 'created row ' id= start_year= end_year=;
end;
run;

* order back data;
proc sort;
by id start_year;
run;

 

The lag function will get you the previous value, which is what you need to work the gaps.

 

More on the lag function here: https://support.sas.com/documentation/cdl/en/lefunctionsref/69762/HTML/default/viewer.htm#n0l66p5oqe...

 

Hope it helps.

 

Daniel Santos @ www.cgd.pt

View solution in original post

6 REPLIES 6
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Messy, but just leaving so no time:

data have;
  input id     start_year     end_year;
datalines; 
1      1991            1992          
1      1992            1993          
1      1993            1993     
1      1993            1993
1      1993            1993     
1      1994            1998
1      1998            2015
2      1993            1994     
3      1985            1986      
3      1990            1991
3      1992            2000
;
run;

data want;
  set have;
  by id;
  retain lst_year;
  if first.id then do;
    syear=start_year;
    eyear=end_year;
    lst_year=end_year;
    output;
  end;
  else do;
    if start_year ne lst_year and start_year-1 ne lst_year then do;
      do i=lst_year to start_year;
        syear=i;
        eyear=i;
        output;
      end;
    end;
    else do;
      syear=start_year;
      eyear=end_year;
      output;
    end;
  end;
  lst_year=end_year;
run;
miguelito
Fluorite | Level 6
Hi RW9,

Thank you for your help!

Your code kind of works but not precisely.

If you see your want table your code assigns a value of 1990 to obs. 14 for id = 3 when actually there is no a gap issue for that row. So, it should be a period value.

On the other hand, the original data for id=3 has a gap for years 1987,1988 and 1989. Hence, your code should add three new rows for these three years. However, it adds five rows. In fact, what I would expect based on my question below is that row 10 is 1987, row 11 is 1988, and row 12 is 1989, with the remaining two rows for what it was.

Do you any any solutions for these two problems? Thank you
RW9
Diamond | Level 26 RW9
Diamond | Level 26

As noted, I have left for the day now.  Try:

      do i=lst_year to start_year;

Replace that with:

      do i=(lst_year+1) to (start_year-1);

 

miguelito
Fluorite | Level 6

This solution nicely identifies under column i the year(s) in which there is a gap, but it does not add the row in the data for the gap as I indicated in red color in my original rpost.

 

Thanks a lot! What I can do to add it the row(s) attached to the year gap(s)? Thanks again

AhmedAl_Attar
Ammonite | Level 13

Techniques in this paper may help you getting your desired outcome

http://support.sas.com/resources/papers/proceedings10/103-2010.pdf

 

Ahmed

DanielSantos
Barite | Level 11

Hi.

 

Assuming data is previously ordered by id and start_year and there are no overlaping periods, here's another way:

 

data want;
set have;
drop _:;

* auxiliary vars;
_end_year=lag1(end_year);
_id=lag1(id);
output; * output row;

* cycle and fill the gap, if needed;
if _id eq id and start_year-1 ne _end_year then
do end_year=_end_year+1 to start_year-1;
start_year=end_year;
output;
put 'created row ' id= start_year= end_year=;
end;
run;

* order back data;
proc sort;
by id start_year;
run;

 

The lag function will get you the previous value, which is what you need to work the gaps.

 

More on the lag function here: https://support.sas.com/documentation/cdl/en/lefunctionsref/69762/HTML/default/viewer.htm#n0l66p5oqe...

 

Hope it helps.

 

Daniel Santos @ www.cgd.pt

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 2156 views
  • 0 likes
  • 4 in conversation