Solved: Create start-stop dataset with time-varying covariates

Loobylou · Posted 08-14-2020 06:38 AM

I am hoping to find a way to automatically create a dataset in SAS that I can use for Extended Cox Proportional Hazard regression using time-varying covariates.

I have a dataset in the current format - albeit it this is simplified to aid understanding of my question!

ID Censored SurvT(yrs) Start_Yr End_Yr +static Xn

1 0 5 2010 2014

2 0 10 2005 2014

3 1 8 2005 2012

I have a separate dataset that contains GDP growth for each of the years 2005-2014.

I want to merge these two datasets, and create a new dataset in the following format (so effectively each ID has multiple observations, representing each year with a different value for GDP growth):

ID Censored Age(yr) Yr GDP_growth +static Xn

1 0 1 2010 -0.5%

1 0 2 2011 +0.25%

1 0 3 2012 +1%

1 0 4 2013 +0.25%

Can anyone advise how I should go about this in SAS and if it is even possible?! I have over 6000 IDs in my dataset, some with timeperiod of over 10 years, so I don't want to create this data manually!

Thank you

andreas_lds · Posted 08-17-2020 03:23 AM

@Loobylou wrote:

Thanks for replying.

I don't think I explained my issue correctly.

The first dataset I have is as described above.

The second one is in the following format:

Yr    GDP_growth

2005      1.5%

…

2010      -0.5%

2011     +0.25%

[...]

Thanks

To avoid such misunderstanding posting data in usable form and the expected result is highly recommended.

If the second dataset is not to large, try:

data want;
   if 0 then set have1 have2;
   set have1;
   
   if _n_ = 1 then do;
      declare hash h(dataset: 'work.have2');
      h.defineKey('yr');
      h.defineData('gdp_growth');
      h.defineDone();
   end;

   do yr = start_yr to end_yr;
      if h.find() = 0 then do;
         output;
      end;
      else do;
         put 'WARNING: No gdp_growth for year ' yr;
      end;
   end;
run;

View solution in original post

Jagadishkatam · Posted 08-14-2020 06:42 AM

You can merge the first dataset with second dataset on id and censored variables as these two variables are common between both datasets.

proc sort data=dataset1;
by id censored;
run;

proc sort data=dataset2;
by id censored;
run;

data want;
merge dataset1 dataset2;
by id censored;
run;

Thanks,
Jag

Loobylou · Posted 08-14-2020 08:04 AM

Thanks for replying.

I don't think I explained my issue correctly.

The first dataset I have is as described above.

The second one is in the following format:

Yr GDP_growth

2005 1.5%

…

2010 -0.5%

2011 +0.25%

So I want to combine these two datasets to create multiple observations for each firm ID where each observations contains a different value for GDP growth, dependent on the year. So, for example, if a firm started in 2005 I want the first obs to be GDP growth for 2005, 2nd obs for that firm to be GDP growth for 2006 etc.

I don;t think it's a simple case of merging the two datasets, as each firm starts at a different year.

Thanks

hhinohar · Posted 08-17-2020 02:24 AM

/* 1st initial dataset */
data have1;
infile datalines dlm='09'x;
input ID Censored SurvT_yr Start_Yr End_Yr;
datalines;
1	0	5	2010	2014
2	0	10	2005	2014
3	1	8	2005	2012
run;

/* 2nd initial dataset 
set index to yr variable */
data have2(index=(yr));
infile datalines dlm='09'x;
	input yr gdp_growth;
datalines;
2005	0.1
2006	1.1
2007	2.1
2008	3.1
2009	4.1
2010	-0.5
2011	0.25
2012	1
2013	0.25
2014	9.1
2015	10.1
run;

/* create another dataset that loops through each year for have1 */
data _have1;
	set have1;
	by id;
	do until(last.id);
		do i=start_yr to end_yr;
			yr=i;
			output;
		end;
	end;
	drop i;
run;

/* merge both datasets  */
data want(rename=(yr=age_yr) drop=Start_Yr End_Yr survt_yr);
retain ID Censored yr Start_Yr End_Yr;
	set _have1;
	set have2 key=yr;
run;

Loobylou · Posted 08-19-2020 12:31 PM

Thanks for all your answers which I have found very helpful.

The solution is good - on small dataset though as took a while to process, so agree would not be as effective if dataset is very large.

And thanks for the tip about posting in the community in future

andreas_lds · Posted 08-17-2020 03:23 AM

@Loobylou wrote:

Thanks for replying.

I don't think I explained my issue correctly.

The first dataset I have is as described above.

The second one is in the following format:

Yr    GDP_growth

2005      1.5%

…

2010      -0.5%

2011     +0.25%

[...]

Thanks

To avoid such misunderstanding posting data in usable form and the expected result is highly recommended.

If the second dataset is not to large, try:

data want;
   if 0 then set have1 have2;
   set have1;
   
   if _n_ = 1 then do;
      declare hash h(dataset: 'work.have2');
      h.defineKey('yr');
      h.defineData('gdp_growth');
      h.defineDone();
   end;

   do yr = start_yr to end_yr;
      if h.find() = 0 then do;
         output;
      end;
      else do;
         put 'WARNING: No gdp_growth for year ' yr;
      end;
   end;
run;

Create start-stop dataset with time-varying covariates

Re: Create start-stop dataset with time-varying covariates

Re: Create start-stop dataset with time-varying covariates

Re: Create start-stop dataset with time-varying covariates

Re: Create start-stop dataset with time-varying covariates

Re: Create start-stop dataset with time-varying covariates

Re: Create start-stop dataset with time-varying covariates

SAS Innovate 2025: Call for Content

Classroom Training Available!