BookmarkSubscribeRSS Feed
SriniRajagopalan
Calcite | Level 5

Hi All,

In one of my projects, I will not be getting patient level data for survival analysis (the usual time to an event, treatment and baseline covariates) but summarized data. I have to suggest a proper way to summarize patient level data for survival analysis. I am trying to figure the best possible summary data that will be amenable to modeling.

One type of summary data I have in mind is attached It is made-up data. The actual data will have more time points and covariates. All covariates, including age, are going to be categorical.

This data set has time points which you can take to be 6-month intervals (1 corresponds to 0-6 months, 2 corresponds to 6-12 months and 3 corresponds to > 12 months), number of events during each interval, number of  patients at risk at the beginning of each interval and two categorical variables. One of the categorical variables has three levels and the other (let's say treatment), two levels. Clearly, number of  patients at risk at the beginning of each interval (except the fist) excludes patients who had an event or were censored in the previous interval. This table has one record for each level of each categorical variable and time period.

I have two questions. Is this the best way to organize the data? If so, what model would be suitable. The log-linear model is not because it does take into account the conditional nature of the data. That is, number at risk at the beginning of each interval clearly is determined by events and censoring in the previous interval. I would like to know what you would suggest.

4 REPLIES 4
Reeza
Super User

I think has posted a document on simplifying your data in this structure to perform analysis.

Not sure its related, but hopefully helpful, if not, ignore Smiley Happy

SriniRajagopalan
Calcite | Level 5

Reeza,

Thanks for the pointer. I am going through it. Trying to understand the program with much of the comments are in Norwegian (?). I think it will be helpful.

SriniRajagopalan
Calcite | Level 5

I split the program into two parts - one to create the summary table and the other to run the Cox model. They both work on the test examples provided by Jacob Simonsen. Although the number of records in the summary table may be larger than the individual patient data table,I think that this is the only summary table that would work . Thanks for pointing it out.

Regards,

Srini

JacobSimonsen
Barite | Level 11

It is correct that the summary table can be larger than the original dataset. Using the original dataset will cause PHREG to create a temporay dataset that can be very large. By using the aggregated data you avoid the large temporary data, and thereby save calculation time. Of course, this matters only if your dataset is large.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1535 views
  • 4 likes
  • 3 in conversation