Hello again! I have a long dataset from the "Early Prediction of Sepsis from Clinical Data: the PhysioNet/Computing in Cardiology Challenge 2019" This has over 1.5 million rows of hourly data, with over 40,000 unique Patient_IDs. There are many variable (such as Hour, HR, Resp, O2Sat, various lab values , etc.) by one outcome variable (SepsisLabel. 0 for no sepsis, 1 for sepsis). I now have the dataset where SepsisLabel changes from 0 to 1, while excluding those who are only SepsisLabel=0 and excluding those who are only SepsisLabel=1. Now I want to do some other things to the data. I want to take the time difference between when sepsislabel changes from 0 to 1 before onset. Here is an example dataset. I've been using ChatGPT to help, but it doesn't seem to understand what I want. I kinda got the code it gave me to work with the time difference AFTER sepsislabel changes from 0 to 1, but not before. Before it gives me missing values. Code below and an example dataset for Patient_ID=34. data have; input Hour HR Temp SepsisLabel Patient_ID onset_time TimeDifference; datalines; 0 88 36.11 0 34 . . 1 88 36.17 0 34 . . 2 88 . 0 34 . . 3 83.5 . 1 34 3 0 4 80 . 1 34 3 1 5 88 36.5 1 34 3 2 6 91 . 1 34 3 3 7 88 . 1 34 3 4 8 80 . 1 34 3 5 9 80 . 1 34 3 6 10 80 . 1 34 3 7 11 82 . 1 34 3 8 12 77 . 1 34 3 9 ; data biosp.sepsis_0_to_1_time_diff; set biosp.sepsis_0_to_1; by Patient_ID; retain onset_time; if first.Patient_ID then onset_time =.; *Initialize the onset time for each unique patient_ID; if sepsislabel=1 and onset_time=. then onset_time=Hour; *Records when sepsislabel changes to 1; if not missing(onset_time) then TimeDifference = Hour - onset_time; /* Calculate time difference */ else TimeDifference=.; run; Please help me with this. Something else I want to do is get the mean, median, mode, q1, q3, min, and max for certain variables (such as HR, Resp, Temp., etc.) at certain time intervals before sepsislabel changes from 0 to 1. I was going to look at t=-4 hours, t=-6 hours, and t=-12 hours. This will single out this long data into 1 row per 1 patient_ID (instead of 11 rows, or 100 rows in some instances for one patient), but the rows will have the mean, median, etc. for those variables of interest (HR, Temp., etc.). This will create many datasheets (1 datasheet for mean values, 1 for median, 1 for q1, etc.), but this can be more usable in a logistic analysis in my opinion. Can anyone help me with this please? Thank you!!!
... View more