I am hoping to find a way to automatically create a dataset in SAS that I can use for Extended Cox Proportional Hazard regression using time-varying covariates.
I have a dataset in the current format - albeit it this is simplified to aid understanding of my question!
ID Censored SurvT(yrs) Start_Yr End_Yr +static Xn
1 0 5 2010 2014
2 0 10 2005 2014
3 1 8 2005 2012
I have a separate dataset that contains GDP growth for each of the years 2005-2014.
I want to merge these two datasets, and create a new dataset in the following format (so effectively each ID has multiple observations, representing each year with a different value for GDP growth):
ID Censored Age(yr) Yr GDP_growth +static Xn
1 0 1 2010 -0.5%
1 0 2 2011 +0.25%
1 0 3 2012 +1%
1 0 4 2013 +0.25%
Can anyone advise how I should go about this in SAS and if it is even possible?! I have over 6000 IDs in my dataset, some with timeperiod of over 10 years, so I don't want to create this data manually!
Thank you
@Loobylou wrote:
Thanks for replying.
I don't think I explained my issue correctly.
The first dataset I have is as described above.
The second one is in the following format:
Yr GDP_growth
2005 1.5%
…
2010 -0.5%
2011 +0.25%
[...]
Thanks
To avoid such misunderstanding posting data in usable form and the expected result is highly recommended.
If the second dataset is not to large, try:
data want;
if 0 then set have1 have2;
set have1;
if _n_ = 1 then do;
declare hash h(dataset: 'work.have2');
h.defineKey('yr');
h.defineData('gdp_growth');
h.defineDone();
end;
do yr = start_yr to end_yr;
if h.find() = 0 then do;
output;
end;
else do;
put 'WARNING: No gdp_growth for year ' yr;
end;
end;
run;
You can merge the first dataset with second dataset on id and censored variables as these two variables are common between both datasets.
proc sort data=dataset1;
by id censored;
run;
proc sort data=dataset2;
by id censored;
run;
data want;
merge dataset1 dataset2;
by id censored;
run;
Thanks for replying.
I don't think I explained my issue correctly.
The first dataset I have is as described above.
The second one is in the following format:
Yr GDP_growth
2005 1.5%
…
2010 -0.5%
2011 +0.25%
So I want to combine these two datasets to create multiple observations for each firm ID where each observations contains a different value for GDP growth, dependent on the year. So, for example, if a firm started in 2005 I want the first obs to be GDP growth for 2005, 2nd obs for that firm to be GDP growth for 2006 etc.
I don;t think it's a simple case of merging the two datasets, as each firm starts at a different year.
Thanks
/* 1st initial dataset */
data have1;
infile datalines dlm='09'x;
input ID Censored SurvT_yr Start_Yr End_Yr;
datalines;
1 0 5 2010 2014
2 0 10 2005 2014
3 1 8 2005 2012
run;
/* 2nd initial dataset
set index to yr variable */
data have2(index=(yr));
infile datalines dlm='09'x;
input yr gdp_growth;
datalines;
2005 0.1
2006 1.1
2007 2.1
2008 3.1
2009 4.1
2010 -0.5
2011 0.25
2012 1
2013 0.25
2014 9.1
2015 10.1
run;
/* create another dataset that loops through each year for have1 */
data _have1;
set have1;
by id;
do until(last.id);
do i=start_yr to end_yr;
yr=i;
output;
end;
end;
drop i;
run;
/* merge both datasets */
data want(rename=(yr=age_yr) drop=Start_Yr End_Yr survt_yr);
retain ID Censored yr Start_Yr End_Yr;
set _have1;
set have2 key=yr;
run;
Thanks for all your answers which I have found very helpful.
The solution is good - on small dataset though as took a while to process, so agree would not be as effective if dataset is very large.
And thanks for the tip about posting in the community in future
@Loobylou wrote:
Thanks for replying.
I don't think I explained my issue correctly.
The first dataset I have is as described above.
The second one is in the following format:
Yr GDP_growth
2005 1.5%
…
2010 -0.5%
2011 +0.25%
[...]
Thanks
To avoid such misunderstanding posting data in usable form and the expected result is highly recommended.
If the second dataset is not to large, try:
data want;
if 0 then set have1 have2;
set have1;
if _n_ = 1 then do;
declare hash h(dataset: 'work.have2');
h.defineKey('yr');
h.defineData('gdp_growth');
h.defineDone();
end;
do yr = start_yr to end_yr;
if h.find() = 0 then do;
output;
end;
else do;
put 'WARNING: No gdp_growth for year ' yr;
end;
end;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.