Hi SAS Community,
I am looking for advise on an issue I am having trying to create a duration variable from two start and end dates/times. I would like to derive this variable and pull basic descriptive data such as mean (SD), median, min max etc. I was previously doing this in excel and did not have any problems (cell D - cell C), however I am experiencing a hard time in SAS given I am a very new user and I am teaching myself (or at least trying to).
Here is my SAS code:
*/ creating new variable for DISEASE attack duration - NOT WORKING /*;
TITLE "DISEASE attack duration - new variable";
DATA WORK.DISEASE_ERT;
SET DISEASE_ERT;
DURATION=DISEASEAD2-DISEASEAD1; /*evaluate attack duration and derive a new variable*/
RUN;
Here is what my data look like:
The SAS Log is really long so I will avoid including it here and also because it has patients numbers listed. But generally it states this multiple times:
NOTE: Invalid numeric data, DISEASEAD2='04APR2020 22:00' , at line 950 column 14.
NOTE: Invalid numeric data, DISEASEAD1='03APR2020 19:00' , at line 950 column 21.
WARNING: Limit set by ERRORS= option reached. Further errors of this type will not be printed.
NOTE: Missing values were generated as a result of performing an operation on missing values. Each place is given by: (Number of times) at (Line):(Column). 669 at 950:20 NOTE: There were 669 observations read from the data set WORK.DISEASE_ERT. NOTE: The data set WORK.DISEASE_ERT has 669 observations and 56 variables. NOTE: DATA statement used (Total process time): real time 0.17 seconds cpu time 0.14 seconds
Any assistance would be greatly appreciated. I have a feeling this is because time and date are combined the dataset but I am not sure how to make these separate and still retrieve a duration variable?
Thank you in advance.
T.
To bring data from an existing data set you should have something like:
data new; set old;
So the name of your existing data set should be in the place of OLD in the above.
You would not include my examples of converting two existing values.
So maybe:
data new;
set DISEASE_ERT;
dt1= input(DISEASEAD1,datetime19.);
dt2= input(DISEASEAD2,datetime19.);
duration = dt2 - dt1;
format dt1 dt2 datetime19. duration time12.;
put (_all_) (=/);
run;
You really want to create a new data set so that if you have a logic problem you do not replace existing values,or delete variables or records accidentally. With a new data set you still have the old one for base values when you find that you need to modify something.
If you want to work with DATETIME values then they need to be NUMBERS and not CHARACTER strings.
You can use the INPUT() function with the DATETIME informat to convert those strings into numbers that represent the number of seconds since 1960. Then if you subtract them you get the number of seconds between the two datetime values. You can use the TIME format to display the seconds in a way that humans will recognize as hours minutes and seconds.
320 data test; 321 DISEASEAD2='04APR2020 22:00'; 322 DISEASEAD1='03APR2020 19:00'; 323 dt1= input(DISEASEAD1,datetime19.); 324 dt2= input(DISEASEAD2,datetime19.); 325 duration = dt2 - dt1; 326 format dt1 dt2 datetime19. duration time12.; 327 put (_all_) (=/); 328 run; DISEASEAD2=04APR2020 22:00 DISEASEAD1=03APR2020 19:00 dt1=03APR2020:19:00:00 dt2=04APR2020:22:00:00 duration=27:00:00
PS Avoid overwriting your input dataset when running a data step. If the code is messed up you will have to go back and re-run whatever code created the original dataset.
The invalid data messages are because your variables are not date, time or datetime values but character. So the first step is going to be to create datetime values.
An important question is what unit do you want your duration to be reported in? By default SAS datetime values will be numbers of seconds and the subtraction approach you use would result in seconds. Do you want seconds for units?
data example; DISEASEAD2='04APR2020 22:00'; DISEASEAD1='03APR2020 19:00'; /* SAS datetime values*/ dd2 = input(DISEASEAD2,anydtdtm30.); dd1 = input(DISEASEAD1,anydtdtm30.); /* format so human readable*/ format dd2 dd1 datetime20.; seconds = dd2-dd1;
minutes = intck('minute',dd1,dd2,'c'); hours = intck('hour',dd1,dd2,'c'); days = intck('dtday',dd1,dd2,'c'); run;
Since Excel does datetime values with the time as a fractional day you possibly want to convert one of the units such as seconds, minutes or hours into fractional days to get comparable numbers.
The INTCK function returns number of time interval boundaries crossed.
https://communities.sas.com/t5/SAS-Communities-Library/Working-with-Dates-and-Times-in-SAS-Tutorial/... has a PDF with much information about SAS date, time and datetime.
Thank you for the information and your replies. I have tried both codes and I am still doing something wrong but I am not sure exactly what.
Here is the SAS codes I have tried:
data DISEASE_ERT;
/* SAS datetime values*/
dd2 = input(DISEASEAD2,anydtdtm30.);
dd1 = input(DISEASEAD1,anydtdtm30.);
/* format so human readable*/
format dd2 dd1 datetime20.;
seconds = dd2-dd1;
minutes = intck('minute',dd1,dd2,'c');
hours = intck('hour',dd1,dd2,'c');
days = intck('dtday',dd1,dd2,'c');
run;
and also ...
data DISEASE_ERT;
dt1= input(DISEASEAD1,datetime19.);
dt2= input(DISEASEAD2,datetime19.);
duration = dt2 - dt1;
format dt1 dt2 datetime19. duration time12.;
put (_all_) (=/);
run;
Here is my SAS Log:
1124 data DISEASE_ERT; 1125 /* SAS datetime values*/ 1126 dd2 = input(DISEASEAD2,anydtdtm30.); 1127 dd1 = input(DISEASEAD1,anydtdtm30.); 1128 /* format so human readable*/ 1129 format dd2 dd1 datetime20.; 1130 seconds = dd2-dd1; 1131 minutes = intck('minute',dd1,dd2,'c'); 1132 hours = intck('hour',dd1,dd2,'c'); 1133 days = intck('dtday',dd1,dd2,'c'); 1134 run; NOTE: Numeric values have been converted to character values at the places given by: (Line):(Column). 1126:16 1127:16 NOTE: Variable DISEASEAD2 is uninitialized. NOTE: Variable DISEASEAD1 is uninitialized. NOTE: Missing values were generated as a result of performing an operation on missing values. Each place is given by: (Number of times) at (Line):(Column). 1 at 1130:17 1 at 1131:14 1 at 1132:12 1 at 1133:12 NOTE: The data set WORK.DISEASE_ERT has 1 observations and 8 variables. NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.00 seconds
Apologies, I am on a huge learning curve and I am trying not to give up.
Thanks in advance,
T.
Make sure to add the SET statement to read in your existing dataset. Since you did not read in your current data when you tried to reference DISEASEAD1 without having even mentioned it before SAS defined it as numeric. That explains the notes.
Also if your stings are consistently in a style that the DATETIME informat can read it is better to use that than one of the ANYDT... informats as those will try to GUESS what style your strings are in.
To bring data from an existing data set you should have something like:
data new; set old;
So the name of your existing data set should be in the place of OLD in the above.
You would not include my examples of converting two existing values.
So maybe:
data new;
set DISEASE_ERT;
dt1= input(DISEASEAD1,datetime19.);
dt2= input(DISEASEAD2,datetime19.);
duration = dt2 - dt1;
format dt1 dt2 datetime19. duration time12.;
put (_all_) (=/);
run;
You really want to create a new data set so that if you have a logic problem you do not replace existing values,or delete variables or records accidentally. With a new data set you still have the old one for base values when you find that you need to modify something.
datetime is number of seconds in SAS. to get days, you need to divide by 86,400
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.