Dear Madam?Sir,
Your help is much appreciated.
I have a very large data (marked bold below) that I wish to compute the difference between the two dates below. However, the variable "Arrival_data_time" is a character while LKW is numeric as below:
Arrival_date_time | Char | 18 | $18. | $18. | Arrival date_time |
LKW | Num | 8 | DATETIME16. | LKW |
I used the code but the results looks wrong:
data bio;
set bio;
LKWH=Arrival_date_time-LKW;
run;
proc means data=Bio median p25 p75 ;
var LKWH;
run;
OUTPUT
Analysis Variable : LKWH | ||
---|---|---|
Median | 25th Pctl | 75th Pctl |
1968891295 | 1967083516 | 1970649275 |
Arrival_date_time | LKW |
3/23/22 9:51 | 3/23/2022 9:00 |
4/28/2022 17:39 | 4/28/2022 9:00 |
5/3/2022 11:55 | 5/2/2022 16:30 |
5/9/2022 21:39 | 5/9/2022 17:20 |
5/15/2022 12:47 | 5/15/2022 12:20 |
6/1/2022 6:39 | 5/31/2022 17:00 |
6/8/2022 21:44 | 6/8/2022 21:00 |
6/13/2022 10:33 | 6/12/2022 23:00 |
. | 6/27/2022 19:30 |
7/18/22 10:37 | 7/17/2022 23:00 |
8/8/2022 15:09 | 8/8/2022 6:00 |
This code seems to work for me. The first data step reproduces your data. The second data step does the differencing. If the second data step does not produce the same results for you that I am getting, then we still do not have a good description of your data.
data have;
input Arrival_date_time : $16. lkw :anydtdtm.;
infile cards delimiter='|';
format lkw datetime16.;
cards;
3/23/22 9:51 | 3/23/2022 9:00
4/28/2022 17:39 | 4/28/2022 9:00
5/3/2022 11:55 | 5/2/2022 16:30
5/9/2022 21:39 | 5/9/2022 17:20
5/15/2022 12:47 | 5/15/2022 12:20
6/1/2022 6:39 | 5/31/2022 17:00
6/8/2022 21:44 | 6/8/2022 21:00
6/13/2022 10:33 | 6/12/2022 23:00
. | 6/27/2022 19:30
7/18/22 10:37 | 7/17/2022 23:00
8/8/2022 15:09 | 8/8/2022 6:00
;
data want;
set have;
arrival_date_time_Num=input(arrival_date_time,anydtdtm.);
difference = arrival_date_time_num - lkw;
format arrival_date_time_num datetime16. difference time.;;
run;
First, you will have to create a numeric variable containing the date/time value from the character variable ARRIVAL_DATE_TIME, then do the subtraction, using a DATA step.
data want;
set have;
/* Create numeric date/time variable */
arrival_date_time_num=input(arrival_date_time,anydtdtm.);
/* Do subtraction */
difference=arrival_date_time_num-lkw;
format difference time.;
run;
It does not work, the answer is negative. I wish to get the difference in minutes.
data Bio;
Set Bio;
arrival_date_time_num=input(arrival_date_time,anydtdtm.);
difference=arrival_date_time_num-lkw;
format difference time.;
run;
proc means data=Bio median p25 p75 ;
var difference;
run;
Analysis Variable : difference | ||
---|---|---|
Median | 25th Pctl | 75th Pctl |
-1810558500 | -1810558500 | -1810558500 |
The numbers for variable DIFFERENCE are in seconds, and I use the proper format, so the values of DIFFERENCE should appear as hours:minutes:seconds. SHOW US a screen capture of (a portion of) the data set BIO so we can see the value of DIFFERENCE for ourselves (not the screen capture from PROC MEANS).
If the numbers are negative, you can change the calculation formula to obtain positive numbers.
It didnt work:
401 data Bio;
402 Set Bio;
403 arrival_date_time_num=input(arrival_date_time,anydtdtm.);
404 difference=arrival_date_time_num-lkw;
405 format difference time.;
406 run;
NOTE: Missing values were generated as a result of performing an operation on missing values.
Each place is given by: (Number of times) at (Line):(Column).
10 at 404:36
NOTE: There were 11 observations read from the data set WORK.BIO.
NOTE: The data set WORK.BIO has 11 observations and 58 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
proc print data=bio;
var difference;
run;
Obs | difference |
---|---|
1 | . |
2 | . |
3 | . |
4 | . |
5 | . |
6 | . |
7 | . |
8 | . |
9 | . |
10 | . |
11 | -502932 |
Okay, show us all the relevant columns of data set BIO (this would be variable ARRIVAL_DATE_TIME, LKW and ARRIVAL_DATE_TIME_NUM). Something is not as you described it, that is why you are getting missing values.
The SAS System |
Obs | arrival_date_time_num | LKW | difference |
---|---|---|---|
1 | . | 23MAR22:09:00:00 | . |
2 | . | 28APR22:09:00:00 | . |
3 | . | 02MAY22:16:30:00 | . |
4 | . | 09MAY22:17:20:00 | . |
5 | . | 15MAY22:12:20:00 | . |
6 | . | 31MAY22:17:00:00 | . |
7 | . | 08JUN22:21:00:00 | . |
8 | . | 12JUN22:23:00:00 | . |
9 | . | 27JUN22:19:30:00 | . |
10 | . | 17JUL22:23:00:00 | . |
11 | 164999100 | 08AUG22:06:00:00 | -502932 |
This does not use the code that I provided, otherwise the DIFFERENCE would appear as hour:minute:seconds.
Your data is not as you represent it, ARRIVAL_DATE_TIME_NUM is missing for most rows, and you need to also show us the data including ARRIVAL_DATE_TIME
You are getting a negative number because 1965 is much earlier than 2022.
1908 data _null_; 1909 dt=164999100; 1910 put dt comma17. dt datetime19.; 1911 dt='08AUG22:06:00:00'dt; 1912 put dt comma17. dt datetime19.; 1913 run; 164,999,100 24MAR1965:17:05:00 1,975,557,600 08AUG2022:06:00:00
A good portion of the data comes out as missing only one convertion is done from char to num
You will need to do the conversion in multiple steps:
arrival_date_time_num = dhms(input(scan(arrival_date_time,1),mmddyy10.),0,0,input(scan(arrival_date_time,2),time5.));
because somebody chose a completely non-standard format for the datetime value. YYYY-MM-DD HH:MM works with the E8601DT informat, and the "ANY" formats are dependent on your locale setting (MDY dates are only used on the American continent, everybody else uses DMY or (best) YMD).
Thank you, I tried but it did not work:
data Bio;
Set Bio;
*arrival_date_time_num=input(arrival_date_time,anydtdtm.);
arrival_date_time_num = dhms(input(scan(arrival_date_time,1),mmddyy10.),0,0,input(scan(arrival_date_time,2),time5.));
difference=arrival_date_time_num-lkw;
format difference time.;
run;
424 data Bio;
425 Set Bio;
426 *arrival_date_time_num=input(arrival_date_time,anydtdtm.);
427 arrival_date_time_num =
427! dhms(input(scan(arrival_date_time,1),mmddyy10.),0,0,input(scan(arrival_date_time,2),time5.));
428 difference=arrival_date_time_num-lkw;
429 format difference time.;
430 run;
NOTE: Invalid argument to function INPUT at line 427 column 30.
NOTE: Invalid argument to function INPUT at line 427 column 77.
We need to explicitly set the delimiter for SCAN to the blank:
data have;
input arrival_date_time $20.;
datalines;
3/23/22 9:51
4/28/2022 17:39
5/3/2022 11:55
5/9/2022 21:39
5/15/2022 12:47
6/1/2022 6:39
6/8/2022 21:44
6/13/2022 10:33
6/27/2022 19:30
7/18/22 10:37
8/8/2022 15:09
;
data want;
set have;
arrival_date_time_num = dhms(input(scan(arrival_date_time,1," "),mmddyy10.),0,0,input(scan(arrival_date_time,2," "),time5.));
format arrival_date_time_num e8601dt19.;
run;
proc print data=want noobs;
var arrival_date_time_num;
run;
Result:
arrival_date_time_num 2022-03-23T09:51:00 2022-04-28T17:39:00 2022-05-03T11:55:00 2022-05-09T21:39:00 2022-05-15T12:47:00 2022-06-01T06:39:00 2022-06-08T21:44:00 2022-06-13T10:33:00 2022-06-27T19:30:00 2022-07-18T10:37:00 2022-08-08T15:09:00
data Bio1;
Set Bio;
*arrival_date_time_num=input(arrival_date_time,anydtdtm.);
arrival_date_time_num = dhms(input(scan(arrival_date_time,1," "),mmddyy10.),0,0,input(scan(arrival_date_time,2," "),time5.));
format arrival_date_time_num e8601dt19.;
run;
proc print data=bio1;
var arrival_date_time_num ;
run;
The SAS System |
Obs | arrival_date_time_num |
---|---|
1 | . |
2 | . |
3 | . |
4 | . |
5 | . |
6 | . |
7 | . |
8 | . |
9 | . |
10 | . |
11 | . |
It does not help to show the results without showing the inputs.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.