DATA Step, Macro, Functions and more

tracking years across two columns by id

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 9
Accepted Solution

tracking years across two columns by id

[ Edited ]

 

Hi Everyone,

 

As illustration purposes I have a dataset with the following two SAS year variables:

 

id     start_year     end_year              

1      1991            1992          

1      1992            1993          

1      1993            1993     

1      1993            1993

1      1993            1993     

1      1994            1998

1      1998            2015

2      1993            1994     

3      1985            1986      

3      1990            1991

3      1992            2000

 

id=1 starts in 1991 and ends in 2015 with no gaps; same for id=2. But id=3 has a gap in 1987, 1988, and 1989

 

First, how can I track and know if there is a gap or no by id?

 

Second, how can I add the years gap by id to my data? For example,

 

id     start_year     end_year              

1      1991            1992          

1      1992            1993          

1      1993            1993     

1      1993            1993

1      1993            1993     

1      1994            1998

1      1998            2015

2      1993            1994     

3      1985            1986

3      1987            1987     

3      1988            1988

3      1989            1989

3      1990            1991

3      1992            2000

 

Thank you very much


Accepted Solutions
Solution
‎01-13-2017 01:16 PM
Super Contributor
Posts: 474

Re: tracking years across two columns by id

[ Edited ]

Hi.

 

Assuming data is previously ordered by id and start_year and there are no overlaping periods, here's another way:

 

data want;
set have;
drop _:;

* auxiliary vars;
_end_year=lag1(end_year);
_id=lag1(id);
output; * output row;

* cycle and fill the gap, if needed;
if _id eq id and start_year-1 ne _end_year then
do end_year=_end_year+1 to start_year-1;
start_year=end_year;
output;
put 'created row ' id= start_year= end_year=;
end;
run;

* order back data;
proc sort;
by id start_year;
run;

 

The lag function will get you the previous value, which is what you need to work the gaps.

 

More on the lag function here: https://support.sas.com/documentation/cdl/en/lefunctionsref/69762/HTML/default/viewer.htm#n0l66p5oqe...

 

Hope it helps.

 

Daniel Santos @ www.cgd.pt

View solution in original post


All Replies
Super User
Super User
Posts: 7,401

Re: tracking years across two columns by id

Messy, but just leaving so no time:

data have;
  input id     start_year     end_year;
datalines; 
1      1991            1992          
1      1992            1993          
1      1993            1993     
1      1993            1993
1      1993            1993     
1      1994            1998
1      1998            2015
2      1993            1994     
3      1985            1986      
3      1990            1991
3      1992            2000
;
run;

data want;
  set have;
  by id;
  retain lst_year;
  if first.id then do;
    syear=start_year;
    eyear=end_year;
    lst_year=end_year;
    output;
  end;
  else do;
    if start_year ne lst_year and start_year-1 ne lst_year then do;
      do i=lst_year to start_year;
        syear=i;
        eyear=i;
        output;
      end;
    end;
    else do;
      syear=start_year;
      eyear=end_year;
      output;
    end;
  end;
  lst_year=end_year;
run;
Occasional Contributor
Posts: 9

Re: tracking years across two columns by id

Hi RW9,

Thank you for your help!

Your code kind of works but not precisely.

If you see your want table your code assigns a value of 1990 to obs. 14 for id = 3 when actually there is no a gap issue for that row. So, it should be a period value.

On the other hand, the original data for id=3 has a gap for years 1987,1988 and 1989. Hence, your code should add three new rows for these three years. However, it adds five rows. In fact, what I would expect based on my question below is that row 10 is 1987, row 11 is 1988, and row 12 is 1989, with the remaining two rows for what it was.

Do you any any solutions for these two problems? Thank you
Super User
Super User
Posts: 7,401

Re: tracking years across two columns by id

As noted, I have left for the day now.  Try:

      do i=lst_year to start_year;

Replace that with:

      do i=(lst_year+1) to (start_year-1);

 

Occasional Contributor
Posts: 9

Re: tracking years across two columns by id

[ Edited ]

This solution nicely identifies under column i the year(s) in which there is a gap, but it does not add the row in the data for the gap as I indicated in red color in my original rpost.

 

Thanks a lot! What I can do to add it the row(s) attached to the year gap(s)? Thanks again

Regular Contributor
Posts: 213

Re: tracking years across two columns by id

Techniques in this paper may help you getting your desired outcome

http://support.sas.com/resources/papers/proceedings10/103-2010.pdf

 

Ahmed

Solution
‎01-13-2017 01:16 PM
Super Contributor
Posts: 474

Re: tracking years across two columns by id

[ Edited ]

Hi.

 

Assuming data is previously ordered by id and start_year and there are no overlaping periods, here's another way:

 

data want;
set have;
drop _:;

* auxiliary vars;
_end_year=lag1(end_year);
_id=lag1(id);
output; * output row;

* cycle and fill the gap, if needed;
if _id eq id and start_year-1 ne _end_year then
do end_year=_end_year+1 to start_year-1;
start_year=end_year;
output;
put 'created row ' id= start_year= end_year=;
end;
run;

* order back data;
proc sort;
by id start_year;
run;

 

The lag function will get you the previous value, which is what you need to work the gaps.

 

More on the lag function here: https://support.sas.com/documentation/cdl/en/lefunctionsref/69762/HTML/default/viewer.htm#n0l66p5oqe...

 

Hope it helps.

 

Daniel Santos @ www.cgd.pt

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 196 views
  • 0 likes
  • 4 in conversation