I have a bunch of data from the Electronic Medical Record, and I am trying to create two cohorts of patients: those that had a time of death before July1, 2023 at midnight, and after. I have tried and tried, and keep failing. The variable "Time_of_Death" is a numeric DateTime16 variable.
My Code is:
DATA PallImpt.Cohorts; SET PallImpt.Source; IF Time_of_Death < '01JUL2023'd then Cohort = 1; IF Time_of_Death > '01JUL2023'd then Cohort = 2; else cohort = 3; run
However, when I look at the results, I get only cohorts 2 and 3 (if missing), but the dates that should be a "1" are a "2".
What am I doing wrong?
DATA PallImpt.Cohorts;
SET PallImpt.Source;
IF Time_of_Death < '01JUL2023'd then Cohort = 1;
ELSE IF Time_of_Death > '01JUL2023'd then Cohort = 2;
else cohort = 3;
run
The missing ELSE is overwriting any COHORT=1. if you have missing values for Time_of_death they will be COHORT=1 not COHORT=3;
I just notice this The variable "Time_of_Death" is a numeric DateTime16 variable.
You need date time constant '01JUL2023:00:00'dt
DATA PallImpt.Cohorts;
SET PallImpt.Source;
if missing(Time_of_death) then cohort = 3;
ELSE IF Time_of_Death < '01JUL2023:00:00'dt then Cohort = 1;
ELSE IF Time_of_Death > '01JUL2023:00:00'dt then Cohort = 2;
run
That should work I think
Thanks for your response! I still get the same- only the occasional 3 and the rest 2's. I wonder if the data is being imported in a way that isn't representative of the date in excel. Is there a way to check that?
@PHOEpiDoc wrote:
Thanks for your response! I still get the same- only the occasional 3 and the rest 2's. I wonder if the data is being imported in a way that isn't representative of the date in excel. Is there a way to check that?
Show you data. Do you have date or time or datetime? Internal(unformatted)and external(formatted)
I think I figured it out! It was a formatting problem in Excel. Thank you so much for your help- I wouldn't have looked at excel without you helping with the SAS side.
DATE values are stored in DAYS and DATETIME values are stored in SECONDS. There are 24*60*60 seconds in a day. So pretty much and DATETIME value after early morning on 01JAN1960 is going to be larger than '01JUL2003'd since that is day number 15,887.
1 data _null_;
2 date='01JUL2003'd ;
3 put date= comma. date=date9. date=datetime19.;
4 run;
date=15,887 date=01JUL2003 date=01JAN1960:04:24:47
So either convert your DATETIME values into DATE values using the DATEPART() function. Or compare the values to DATETIME constants instead of DATE constants.
Also make sure to figure out what cohort the time stamp of midnight on 01JUL2003 should be included in. Right now you have it in COHORT 3. Also remember that missing values are smaller than any actual value.
This code will place 01JUL2003 in COHORT 1 and place only missing values in COHORT 3.
data PallImpt.Cohorts;
set PallImpt.Source;
if datepart(Time_of_Death) > '01JUL2023'd then Cohort = 2;
else if missing(Time_of_Death) then cohort = 3;
else cohort=1;
run;
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.