Help using Base SAS procedures

Overlapping dates

Reply
N/A
Posts: 1

Overlapping dates

Hello everyone,

I have this initial table :

DATA temp;

input id trt $ start end;

cards;

1 A 2000 2004

1 B 2002 2006

run;

Schema :

          2000     2001     2002     2003     2004     2005     2006

A           |-----------|-----------|-----------|-----------|

B                                  |------------|-----------|------------|------------|

I was wondering how to recreate treatment sequences according to years.

More clearly i would like to obtain this database :

DATA result;

input id trt $ start end;

cards;

1 A 2000 2002

1 A-B 2002 2004

1 B 2004 2006

run;

Schema :

          2000     2001     2002     2003     2004     2005     2006

A           |-----------|-----------|

A-B                               |-----------|-----------|

B                                                          |------------|------------|

Does anyone have an idea to solve this case ?

Thanks in advance.

Respected Advisor
Posts: 3,156

Re: Overlapping dates

There are ways to solve this problem, but let me first ask you, why the overlap in your output dataset. For example, why 2002 is shared by A and A-B? It seems to me that the moment B is brought in, it is not A anymore. I am aware that a year is a range of time, but I also noticed that you used it as an UNIT, so it makes me wary upon your downstream process.

Here is an option:

DATA have;

     input id trt $ start end;

     cards;

1 A 2000 2004

1 B 2002 2006

1 C 2004 2009

1 D 2012 2015

;

run;

data _null_;

     if _n_=1 then

           do;

                declare hash h();

                h.definekey('id','year');

                h.definedata('id','year', 'treatment');

                h.definedone();

                length treatment $ 20;

           end;

     do until (last.id);

           set have end=last;

           by id notsorted;

          do year=start to end;

                if h.find() ne 0 then

                     do;

                           treatment=trt;

                           rc=h.replace();

                     end;

                else

                     do;

                           treatment=catx('-',treatment, trt);

                           rc=h.replace();

                     end;

           end;

     end;

     if last then

           rc=h.output(dataset:'long');

run;

proc sql;

     create table want_clean as

           select treatment, min(year) as _start, max(year) as _end

                from long

                     group by treatment;

quit;

/*will the following process be really necessary???*/

data want_messy;

     merge want_clean want_clean (firstobs=2 keep=treatment rename=treatment=_t);

     if find(_t,treatment, 't') then

           _end+1;

     if find(lag(treatment),treatment,'t') then

           _start=_start-1;

     drop _t;

run;

Ask a Question
Discussion stats
  • 1 reply
  • 203 views
  • 0 likes
  • 2 in conversation