I am hoping to find a way to automatically create a dataset in SAS that I can use for Extended Cox Proportional Hazard regression using time-varying covariates.
I have a dataset in the current format - albeit it this is simplified to aid understanding of my question!
ID Censored SurvT(yrs) Start_Yr End_Yr +static Xn
1 0 5 2010 2014
2 0 10 2005 2014
3 1 8 2005 2012
I have a separate dataset that contains GDP growth for each of the years 2005-2014.
I want to merge these two datasets, and create a new dataset in the following format (so effectively each ID has multiple observations, representing each year with a different value for GDP growth):
ID Censored Age(yr) Yr GDP_growth +static Xn
1 0 1 2010 -0.5%
1 0 2 2011 +0.25%
1 0 3 2012 +1%
1 0 4 2013 +0.25%
Can anyone advise how I should go about this in SAS and if it is even possible?! I have over 6000 IDs in my dataset, some with timeperiod of over 10 years, so I don't want to create this data manually!
Thank you
@Loobylou wrote:
Thanks for replying.
I don't think I explained my issue correctly.
The first dataset I have is as described above.
The second one is in the following format:
Yr GDP_growth
2005 1.5%
…
2010 -0.5%
2011 +0.25%
[...]
Thanks
To avoid such misunderstanding posting data in usable form and the expected result is highly recommended.
If the second dataset is not to large, try:
data want;
   if 0 then set have1 have2;
   set have1;
   
   if _n_ = 1 then do;
      declare hash h(dataset: 'work.have2');
      h.defineKey('yr');
      h.defineData('gdp_growth');
      h.defineDone();
   end;
   do yr = start_yr to end_yr;
      if h.find() = 0 then do;
         output;
      end;
      else do;
         put 'WARNING: No gdp_growth for year ' yr;
      end;
   end;
run;
You can merge the first dataset with second dataset on id and censored variables as these two variables are common between both datasets.
proc sort data=dataset1;
by id censored;
run;
proc sort data=dataset2;
by id censored;
run;
data want;
merge dataset1 dataset2;
by id censored;
run;
Thanks for replying.
I don't think I explained my issue correctly.
The first dataset I have is as described above.
The second one is in the following format:
Yr GDP_growth
2005 1.5%
…
2010 -0.5%
2011 +0.25%
So I want to combine these two datasets to create multiple observations for each firm ID where each observations contains a different value for GDP growth, dependent on the year. So, for example, if a firm started in 2005 I want the first obs to be GDP growth for 2005, 2nd obs for that firm to be GDP growth for 2006 etc.
I don;t think it's a simple case of merging the two datasets, as each firm starts at a different year.
Thanks
/* 1st initial dataset */
data have1;
infile datalines dlm='09'x;
input ID Censored SurvT_yr Start_Yr End_Yr;
datalines;
1	0	5	2010	2014
2	0	10	2005	2014
3	1	8	2005	2012
run;
/* 2nd initial dataset 
set index to yr variable */
data have2(index=(yr));
infile datalines dlm='09'x;
	input yr gdp_growth;
datalines;
2005	0.1
2006	1.1
2007	2.1
2008	3.1
2009	4.1
2010	-0.5
2011	0.25
2012	1
2013	0.25
2014	9.1
2015	10.1
run;
/* create another dataset that loops through each year for have1 */
data _have1;
	set have1;
	by id;
	do until(last.id);
		do i=start_yr to end_yr;
			yr=i;
			output;
		end;
	end;
	drop i;
run;
/* merge both datasets  */
data want(rename=(yr=age_yr) drop=Start_Yr End_Yr survt_yr);
retain ID Censored yr Start_Yr End_Yr;
	set _have1;
	set have2 key=yr;
run;Thanks for all your answers which I have found very helpful.
The solution is good - on small dataset though as took a while to process, so agree would not be as effective if dataset is very large.
And thanks for the tip about posting in the community in future
@Loobylou wrote:
Thanks for replying.
I don't think I explained my issue correctly.
The first dataset I have is as described above.
The second one is in the following format:
Yr GDP_growth
2005 1.5%
…
2010 -0.5%
2011 +0.25%
[...]
Thanks
To avoid such misunderstanding posting data in usable form and the expected result is highly recommended.
If the second dataset is not to large, try:
data want;
   if 0 then set have1 have2;
   set have1;
   
   if _n_ = 1 then do;
      declare hash h(dataset: 'work.have2');
      h.defineKey('yr');
      h.defineData('gdp_growth');
      h.defineDone();
   end;
   do yr = start_yr to end_yr;
      if h.find() = 0 then do;
         output;
      end;
      else do;
         put 'WARNING: No gdp_growth for year ' yr;
      end;
   end;
run;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
