Dear all,
I have a data set with blood glucose levels (Glu_Level) at different time points before the event date (PDAC_DATE) for all patients. These patients are divided into two groups (Group, coded as 1, 0). The outcome of interest is PDAC_DATE. I would like to compare the median glucose levels or mean glucose levels before the event (PDAC_DATE) at different time points, i.e 60 months before, 54 months before, 48 months before, 42 months before, 36 months before...etc up to 0 months before PDAC_DATE (event date). I want to get the mean values with 95% CI's and compare the mean levels at different time-points between the two groups to see if there is any significant difference.
Variables in the data set:
Glu_Date: Date of glucose testing
Glu_Level: Glucose measurement
Group: Comparison group (coded 1, 0)
PDAC_Date: Event date
NOTE: Number of patients in each of these two groups who have the glucose level tested may vary across at different time-points.
Any advice with this analysis would be of great help!
Thank you all in advance!
You can use some SAS steps to construct the data depending on the date and then later use proc means or some ther procedure to calculate means.
Please provide some sample data in the form of data step for more responses.
@sms1891 wrote:
Dear all,
I have a data set with blood glucose levels (Glu_Level) at different time points before the event date (PDAC_DATE) for all patients. These patients are divided into two groups (Group, coded as 1, 0). The outcome of interest is PDAC_DATE. I would like to compare the median glucose levels or mean glucose levels before the event (PDAC_DATE) at different time points, i.e 60 months before, 54 months before, 48 months before, 42 months before, 36 months before...etc up to 0 months before PDAC_DATE (event date). I want to get the mean values with 95% CI's and compare the mean levels at different time-points between the two groups to see if there is any significant difference.
Variables in the data set:
Glu_Date: Date of glucose testing
Glu_Level: Glucose measurement
Group: Comparison group (coded 1, 0)
PDAC_Date: Event date
NOTE: Number of patients in each of these two groups who have the glucose level tested may vary across at different time-points.
Any advice with this analysis would be of great help!
Thank you all in advance!
The structure of your current data will have a serious impact on approaches. First thing, are you dates actually SAS date values or character values or numeric like 05122018 imitating dates but actual format is best8 or similar?
Is the PDAC_date variable on every record or only once per group?
There could also be some consideration about defining "48 months before" such as does an end date "exactly" 48 months get included or not. Is "48 months" a calendar month count or a specific number of days?. Note that 48 months potentially has two leap years in that interval with two Februaries of 29 days.
One approach could involve creating an array of the desired interval values as indicator variables of 1 when in an interval and 0 otherwise but again, some starting data in the form of a data step would help.
You don't show a patient identifier. That might make comparing things a little difficult depending on actual structure of your data set.
Thank you for the response. All my date variables are in MMDDYY10. format. As far as the data structure is concerned, I have to build the data set from each variable (which is extracted separately from the master list of patients using Pt_ID variable).
So I have a list of 26,000 cancer patients with the following variables:
1. PT_ID
2. PDAC_Date's (Cancer date in MMDDYY10.)
3. Group (Coded as 1 for all cancer cases)
I also a pool of +9 million other patients from which I am picking up 1:3 controls matched for age and gender with cancer patients and Visit_Date to hospital same month as cancer patients.
Variables:
1. PT_ID (note this patients are different from cancer patients so their PT_IDs are unique compared to cancer patients)
2. Visit_Date (Date of hospital visit in MMDDYY10. )
3. Group (Coded 0 for the control patients).
Once I assemble this cohort, I have to merge it with Glucose data which will have PT_ID, Glu_Date and Glu_Level.
4. Glu (Coded as 1, if a patient has a glucose level measurement)
5. Glu_Date (date of glucose measurement)
6. Glu_Level (glucose value)
7. Index_Date (single final outcome date variable, will be either PDAC_Date for pts with cancer or Visit_Date for rest of the pts without cancer based on Group variable coding 1/0)
NOTE: Not all patients have the glucose level measured. Among those patients who have glucose measured, they may have multiple glucose entries
After excluding Glu = 0, I will end up with patients with Glu measurement only. Based on the Glu_Date and Index_Date, I will have datediff_glu for all patients.
Once I have this vertical data set, my question is how to handle multiple records for single PT_ID and come up with a mean value based on the timing of Glu_Date with respect to Index_Date using 6-month time periods and then compare these 6-month mean values between Group 1 vs Group 0?
Sorry I do not have a data set sample here as I cannot extract out the data from the server (confidentiality). However, I am attaching a PDF paper which analyze the glucose values using 6-month time periods (I am interested in Cohort A and Figure 1, highlighted on the PDF)
Dummy data is fine as long as variable types are consistent and values simulate, or at least come close to, the behavior of actual data. Any identifiers can be arbitrary values such as XXXXX YYYYY or similar as long as there are values that provide similar characteristics.
Best if your example data includes an example output data set with added variables and clear rules on how each of them is created.
It looks like the main portion of your problem may be the assignment of time category based on a measure date and some other date.
The function INTCK would start that process: monthsbetween = intck('month',startdate, enddate) ; order of the date variables/values is important. if enddate is before startdate in above then the result is negative. If the days within the month are likely to be important you likely want to add the ,'C' modifier. Otherwise the default of discrete means that the difference between 31Jan2013 and 1Feb2013 is one month.
A custom format for the desired ranges to create the month interval values would probably be in order as formats can create groups used by any of the analysis or summary procedures.
You Figure 1 in the paper shows what appear to be p-values. P-values are associated with statistical tests but from a brief scan of the paper I cannot tell what test is used. To generate that figure you will likely need to create some output data from one or more analysis procedures, combine it with the appropriate variables to overlay box plots, what I believe to be a regression curve and some text for the p-values.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.