BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Loobylou
Calcite | Level 5

I am hoping to find a way to automatically create a dataset in SAS that I can use for Extended Cox Proportional Hazard regression using time-varying covariates.

 

I have a dataset in the current format - albeit it this is simplified to aid understanding of my question!

ID  Censored  SurvT(yrs) Start_Yr  End_Yr  +static Xn

1       0                      5          2010       2014

2       0                     10         2005       2014

3       1                      8          2005       2012

 

I have a separate dataset that contains GDP growth for each of the years 2005-2014.

I want to merge these two datasets, and create a new dataset in the following format (so effectively each ID has multiple observations, representing each year with a different value for GDP growth):

ID    Censored    Age(yr)     Yr     GDP_growth   +static Xn

1         0                      1      2010       -0.5%

1         0                     2      2011       +0.25%

1         0                       3      2012        +1%

1         0                       4      2013       +0.25%

 

Can anyone advise how I should go about this in SAS and if it is even possible?!  I have over 6000 IDs in my dataset, some with timeperiod of over 10 years, so I don't want to create this data manually!

 

Thank you

 

1 ACCEPTED SOLUTION

Accepted Solutions
andreas_lds
Jade | Level 19

@Loobylou wrote:

Thanks for replying.

 

I don't think I explained my issue correctly.

 

The first dataset I have is as described above.

 

The second one is in the following format:

Yr    GDP_growth

2005      1.5%

2010      -0.5%

2011      +0.25%

[...]

Thanks


To avoid such misunderstanding posting data in usable form and the expected result is highly recommended.

If the second dataset is not to large, try:

data want;
   if 0 then set have1 have2;
   set have1;
   
   if _n_ = 1 then do;
      declare hash h(dataset: 'work.have2');
      h.defineKey('yr');
      h.defineData('gdp_growth');
      h.defineDone();
   end;

   do yr = start_yr to end_yr;
      if h.find() = 0 then do;
         output;
      end;
      else do;
         put 'WARNING: No gdp_growth for year ' yr;
      end;
   end;
run;

 

View solution in original post

5 REPLIES 5
Jagadishkatam
Amethyst | Level 16

You can merge the first dataset with second dataset on id and censored variables as these two variables are common between both datasets.

 

proc sort data=dataset1;
by id censored;
run;

proc sort data=dataset2;
by id censored;
run;

data want;
merge dataset1 dataset2;
by id censored;
run;

 

Thanks,
Jag
Loobylou
Calcite | Level 5

Thanks for replying.

 

I don't think I explained my issue correctly.

 

The first dataset I have is as described above.

 

The second one is in the following format:

Yr    GDP_growth

2005      1.5%

2010      -0.5%

2011      +0.25%

 

So I want to combine these two datasets to create multiple observations for each firm ID where each observations contains a different value for GDP growth, dependent on the year.  So, for example, if a firm started in 2005 I want the first obs to be GDP growth for 2005, 2nd obs for that firm to be GDP growth for 2006 etc. 

 

I don;t think it's a simple case of merging the two datasets, as each firm starts at a different year.

 

Thanks

hhinohar
Quartz | Level 8
/* 1st initial dataset */
data have1;
infile datalines dlm='09'x;
input ID Censored SurvT_yr Start_Yr End_Yr;
datalines;
1	0	5	2010	2014
2	0	10	2005	2014
3	1	8	2005	2012
run;

/* 2nd initial dataset 
set index to yr variable */
data have2(index=(yr));
infile datalines dlm='09'x;
	input yr gdp_growth;
datalines;
2005	0.1
2006	1.1
2007	2.1
2008	3.1
2009	4.1
2010	-0.5
2011	0.25
2012	1
2013	0.25
2014	9.1
2015	10.1
run;

/* create another dataset that loops through each year for have1 */
data _have1;
	set have1;
	by id;
	do until(last.id);
		do i=start_yr to end_yr;
			yr=i;
			output;
		end;
	end;
	drop i;
run;

/* merge both datasets  */
data want(rename=(yr=age_yr) drop=Start_Yr End_Yr survt_yr);
retain ID Censored yr Start_Yr End_Yr;
	set _have1;
	set have2 key=yr;
run;
Loobylou
Calcite | Level 5

Thanks for all your answers which I have found very helpful.

The solution is good - on small dataset though as took a while to process, so agree would not be as effective if dataset is very large. 

And thanks for the tip about posting in the community in future 

andreas_lds
Jade | Level 19

@Loobylou wrote:

Thanks for replying.

 

I don't think I explained my issue correctly.

 

The first dataset I have is as described above.

 

The second one is in the following format:

Yr    GDP_growth

2005      1.5%

2010      -0.5%

2011      +0.25%

[...]

Thanks


To avoid such misunderstanding posting data in usable form and the expected result is highly recommended.

If the second dataset is not to large, try:

data want;
   if 0 then set have1 have2;
   set have1;
   
   if _n_ = 1 then do;
      declare hash h(dataset: 'work.have2');
      h.defineKey('yr');
      h.defineData('gdp_growth');
      h.defineDone();
   end;

   do yr = start_yr to end_yr;
      if h.find() = 0 then do;
         output;
      end;
      else do;
         put 'WARNING: No gdp_growth for year ' yr;
      end;
   end;
run;

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 2200 views
  • 1 like
  • 4 in conversation