BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Loobylou
Calcite | Level 5

I am hoping to find a way to automatically create a dataset in SAS that I can use for Extended Cox Proportional Hazard regression using time-varying covariates.

 

I have a dataset in the current format - albeit it this is simplified to aid understanding of my question!

ID  Censored  SurvT(yrs) Start_Yr  End_Yr  +static Xn

1       0                      5          2010       2014

2       0                     10         2005       2014

3       1                      8          2005       2012

 

I have a separate dataset that contains GDP growth for each of the years 2005-2014.

I want to merge these two datasets, and create a new dataset in the following format (so effectively each ID has multiple observations, representing each year with a different value for GDP growth):

ID    Censored    Age(yr)     Yr     GDP_growth   +static Xn

1         0                      1      2010       -0.5%

1         0                     2      2011       +0.25%

1         0                       3      2012        +1%

1         0                       4      2013       +0.25%

 

Can anyone advise how I should go about this in SAS and if it is even possible?!  I have over 6000 IDs in my dataset, some with timeperiod of over 10 years, so I don't want to create this data manually!

 

Thank you

 

1 ACCEPTED SOLUTION

Accepted Solutions
andreas_lds
Jade | Level 19

@Loobylou wrote:

Thanks for replying.

 

I don't think I explained my issue correctly.

 

The first dataset I have is as described above.

 

The second one is in the following format:

Yr    GDP_growth

2005      1.5%

2010      -0.5%

2011      +0.25%

[...]

Thanks


To avoid such misunderstanding posting data in usable form and the expected result is highly recommended.

If the second dataset is not to large, try:

data want;
   if 0 then set have1 have2;
   set have1;
   
   if _n_ = 1 then do;
      declare hash h(dataset: 'work.have2');
      h.defineKey('yr');
      h.defineData('gdp_growth');
      h.defineDone();
   end;

   do yr = start_yr to end_yr;
      if h.find() = 0 then do;
         output;
      end;
      else do;
         put 'WARNING: No gdp_growth for year ' yr;
      end;
   end;
run;

 

View solution in original post

5 REPLIES 5
Jagadishkatam
Amethyst | Level 16

You can merge the first dataset with second dataset on id and censored variables as these two variables are common between both datasets.

 

proc sort data=dataset1;
by id censored;
run;

proc sort data=dataset2;
by id censored;
run;

data want;
merge dataset1 dataset2;
by id censored;
run;

 

Thanks,
Jag
Loobylou
Calcite | Level 5

Thanks for replying.

 

I don't think I explained my issue correctly.

 

The first dataset I have is as described above.

 

The second one is in the following format:

Yr    GDP_growth

2005      1.5%

2010      -0.5%

2011      +0.25%

 

So I want to combine these two datasets to create multiple observations for each firm ID where each observations contains a different value for GDP growth, dependent on the year.  So, for example, if a firm started in 2005 I want the first obs to be GDP growth for 2005, 2nd obs for that firm to be GDP growth for 2006 etc. 

 

I don;t think it's a simple case of merging the two datasets, as each firm starts at a different year.

 

Thanks

hhinohar
Quartz | Level 8
/* 1st initial dataset */
data have1;
infile datalines dlm='09'x;
input ID Censored SurvT_yr Start_Yr End_Yr;
datalines;
1	0	5	2010	2014
2	0	10	2005	2014
3	1	8	2005	2012
run;

/* 2nd initial dataset 
set index to yr variable */
data have2(index=(yr));
infile datalines dlm='09'x;
	input yr gdp_growth;
datalines;
2005	0.1
2006	1.1
2007	2.1
2008	3.1
2009	4.1
2010	-0.5
2011	0.25
2012	1
2013	0.25
2014	9.1
2015	10.1
run;

/* create another dataset that loops through each year for have1 */
data _have1;
	set have1;
	by id;
	do until(last.id);
		do i=start_yr to end_yr;
			yr=i;
			output;
		end;
	end;
	drop i;
run;

/* merge both datasets  */
data want(rename=(yr=age_yr) drop=Start_Yr End_Yr survt_yr);
retain ID Censored yr Start_Yr End_Yr;
	set _have1;
	set have2 key=yr;
run;
Loobylou
Calcite | Level 5

Thanks for all your answers which I have found very helpful.

The solution is good - on small dataset though as took a while to process, so agree would not be as effective if dataset is very large. 

And thanks for the tip about posting in the community in future 

andreas_lds
Jade | Level 19

@Loobylou wrote:

Thanks for replying.

 

I don't think I explained my issue correctly.

 

The first dataset I have is as described above.

 

The second one is in the following format:

Yr    GDP_growth

2005      1.5%

2010      -0.5%

2011      +0.25%

[...]

Thanks


To avoid such misunderstanding posting data in usable form and the expected result is highly recommended.

If the second dataset is not to large, try:

data want;
   if 0 then set have1 have2;
   set have1;
   
   if _n_ = 1 then do;
      declare hash h(dataset: 'work.have2');
      h.defineKey('yr');
      h.defineData('gdp_growth');
      h.defineDone();
   end;

   do yr = start_yr to end_yr;
      if h.find() = 0 then do;
         output;
      end;
      else do;
         put 'WARNING: No gdp_growth for year ' yr;
      end;
   end;
run;

 

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1712 views
  • 1 like
  • 4 in conversation