Solved: Re: Taking time difference from before when one variable changes from ...

pdick2 · Posted 04-16-2024 12:00 PM

Hello again!

I have a long dataset from the "Early Prediction of Sepsis from Clinical Data: the PhysioNet/Computing in Cardiology Challenge 2019" This has over 1.5 million rows of hourly data, with over 40,000 unique Patient_IDs. There are many variable (such as Hour, HR, Resp, O2Sat, various lab values , etc.) by one outcome variable (SepsisLabel. 0 for no sepsis, 1 for sepsis).

I now have the dataset where SepsisLabel changes from 0 to 1, while excluding those who are only SepsisLabel=0 and excluding those who are only SepsisLabel=1.

Now I want to do some other things to the data. I want to take the time difference between when sepsislabel changes from 0 to 1 before onset. Here is an example dataset. I've been using ChatGPT to help, but it doesn't seem to understand what I want. I kinda got the code it gave me to work with the time difference AFTER sepsislabel changes from 0 to 1, but not before. Before it gives me missing values. Code below and an example dataset for Patient_ID=34.

data have;
input Hour HR Temp SepsisLabel Patient_ID onset_time TimeDifference;
datalines;
0 88 36.11 0 34 . .
1 88 36.17 0 34 . .
2 88 . 0 34 . .
3 83.5 . 1 34 3 0
4 80 . 1 34 3 1
5 88 36.5 1 34 3 2
6 91 . 1 34 3 3
7 88 . 1 34 3 4
8 80 . 1 34 3 5
9 80 . 1 34 3 6
10 80 . 1 34 3 7
11 82 . 1 34 3 8
12 77 . 1 34 3 9
;

data biosp.sepsis_0_to_1_time_diff;
set biosp.sepsis_0_to_1;
by Patient_ID;
retain onset_time;
if first.Patient_ID then onset_time =.; *Initialize the onset time for each unique patient_ID;
if sepsislabel=1 and onset_time=. then onset_time=Hour; *Records when sepsislabel changes to 1;
if not missing(onset_time) then TimeDifference = Hour - onset_time; /* Calculate time difference */
else TimeDifference=.;
run;

Please help me with this.

Something else I want to do is get the mean, median, mode, q1, q3, min, and max for certain variables (such as HR, Resp, Temp., etc.) at certain time intervals before sepsislabel changes from 0 to 1. I was going to look at t=-4 hours, t=-6 hours, and t=-12 hours. This will single out this long data into 1 row per 1 patient_ID (instead of 11 rows, or 100 rows in some instances for one patient), but the rows will have the mean, median, etc. for those variables of interest (HR, Temp., etc.). This will create many datasheets (1 datasheet for mean values, 1 for median, 1 for q1, etc.), but this can be more usable in a logistic analysis in my opinion.

Can anyone help me with this please? Thank you!!!

PaigeMiller · Posted 04-16-2024 02:15 PM

Yes thank you, this is helpful.

data transition;
    set have;
    by patient_id;
    prev_sepsis=lag(sepsisLabel);
    if not first.patient_id and sepsisLabel=1 and prev_sepsis<1 then output;
run;
data want;
    merge have transition(rename=(hour=transition_hour) keep=hour patient_id);
    by patient_id;
    time_difference=hour-transition_hour;
run;

What I would like to do is, for example, take the mean of a column variable (lets use HR) for one that unique Patient_ID between the hours of -4 to 0. This will allow me to get the mean value of HR 4 hours before sepsislabel changes from 0 to 1. I would like to look at other variables besides HR, and other time intervals as well (such as -6, -12, etc. before the sepsislabel changes from 0 to 1).

Sounds like a job for PROC SUMMARY.

--
Paige Miller

View solution in original post

PaigeMiller · Posted 04-16-2024 12:48 PM

Is this what you want? You haven't given us an example of what you do want, you told us you got what you don't want. (In the future, please be clear about what you do want).

I have removed onset_time and timedifference from HAVE.

data have;
input Hour HR Temp SepsisLabel Patient_ID;
datalines;
0 88 36.11 0 34 . .
1 88 36.17 0 34 . .
2 88 . 0 34 . .
3 83.5 . 1 34 3 0
4 80 . 1 34 3 1
5 88 36.5 1 34 3 2
6 91 . 1 34 3 3
7 88 . 1 34 3 4
8 80 . 1 34 3 5
9 80 . 1 34 3 6
10 80 . 1 34 3 7
11 82 . 1 34 3 8
12 77 . 1 34 3 9
;

data sepsis_0_to_1_time_diff;
set have;
by Patient_ID;
retain onset_time;
lag_sepsis=lag(sepsislabel);
if first.patient_id then onset_time=hour;
if sepsislabel=1 and lag_sepsis<1 then TimeDifference = Hour - onset_time; /* Calculate time difference */
else TimeDifference=.;
run;

I don't understand the rest of your question, but this part bothers me: "This will create many datasheets (1 datasheet for mean values, 1 for median, 1 for q1, etc.)", why wouldn't you just put all the results (mean, median, etc.) into one data set? Separate data sets means more complicated programming.

I am not surprised ChatGPT fails here. It relies on finding answers to similar problems, and this is relatively unique problem.

--
Paige Miller

pdick2 · Posted 04-16-2024 01:15 PM

I'll try to be more clear. Here's what I do want or would like to see in a datalines format:

data have;
input Hour HR Temp SepsisLabel Patient_ID TimeDifference;
datalines;
0 88 36.11 0 34 -3
1 88 36.17 0 34 -2 
2 88 . 0 34 -1
3 83.5 . 1 34 0
4 80 . 1 34 1
5 88 36.5 1 34 2
6 91 . 1 34 3
7 88 . 1 34 4
8 80 . 1 34 5
9 80 . 1 34 6
10 80 . 1 34 7
11 82 . 1 34 8
12 77 . 1 34 9
;

And I didn't think about putting them on one data set. I see what you mean, it'll be easier to use one "calculated values" data set (and using the vars I want) instead of 6 data sets. Do you recommend just making one new data set with all of the calculated values or have them on the original "have" data set? This isn't the source data set, but just another copy of it (so I don't overwrite the original data).

What I would like to do is, for example, take the mean of a column variable (lets use HR) for one that unique Patient_ID between the hours of -4 to 0. This will allow me to get the mean value of HR 4 hours before sepsislabel changes from 0 to 1. I would like to look at other variables besides HR, and other time intervals as well (such as -6, -12, etc. before the sepsislabel changes from 0 to 1).

The code you gave didn't work for me. Maybe the specifics above will help. Thank you so much for helping me! First time SAS user here. Also, I can send the data set where it only has data when sepsislabel changed from 0 to 1, if you want to experiment.

Here is a screenshot of what I got from the code you gave:

PaigeMiller · Posted 04-16-2024 02:15 PM

Yes thank you, this is helpful.

data transition;
    set have;
    by patient_id;
    prev_sepsis=lag(sepsisLabel);
    if not first.patient_id and sepsisLabel=1 and prev_sepsis<1 then output;
run;
data want;
    merge have transition(rename=(hour=transition_hour) keep=hour patient_id);
    by patient_id;
    time_difference=hour-transition_hour;
run;

What I would like to do is, for example, take the mean of a column variable (lets use HR) for one that unique Patient_ID between the hours of -4 to 0. This will allow me to get the mean value of HR 4 hours before sepsislabel changes from 0 to 1. I would like to look at other variables besides HR, and other time intervals as well (such as -6, -12, etc. before the sepsislabel changes from 0 to 1).

Sounds like a job for PROC SUMMARY.

--
Paige Miller

pdick2 · Posted 04-16-2024 02:29 PM

This worked thank you so much!

Taking time difference from before when one variable changes from 0 to 1

Re: Taking time difference from before when one variable changes from 0 to 1

Re: Taking time difference from before when one variable changes from 0 to 1

Re: Taking time difference from before when one variable changes from 0 to 1

Re: Taking time difference from before when one variable changes from 0 to 1

Re: Taking time difference from before when one variable changes from 0 to 1

Taking time difference from before when one variable changes from 0 to 1

Re: Taking time difference from before when one variable changes from 0 to 1

Re: Taking time difference from before when one variable changes from 0 to 1

Re: Taking time difference from before when one variable changes from 0 to 1

Re: Taking time difference from before when one variable changes from 0 to 1

Re: Taking time difference from before when one variable changes from 0 to 1

Click image to register for webinar

Classroom Training Available!