Hi All,
I have trouble calculating with SAS Datetime all the time. Please provide me your suggestions on this problem, I really appreciate all for helping many beginners with SAS. I have a dataset with two columns with datetimes and some demographic variables (table below): both incident_es and incident_ei are character variables. I want to calculate number of observations within three hours of incident_es. Can somebody help me with this please?
Thank you,
M
gender | age | incident_es | incident_ei |
M | 30 | 7/5/2019 1:07 | 05JUL19:01:45:00 |
M | 65 | 7/7/2019 20:37 | 07JUL19:21:14:00 |
M | 25 | 7/20/2019 23:01 | 20JUL19:23:24:00 |
F | 30 | 8/10/2019 15:41 | 10AUG19:16:34:00 |
M | 64 | 9/12/2019 17:50 | 12SEP19:18:30:00 |
M | 55 | 7/29/2019 13:27 | 29JUL19:14:10:00 |
M | 39 | 9/22/2019 0:46 | 22SEP19:01:18:00 |
F | 56 | 9/1/2019 19:35 | 01SEP19:20:00:00 |
F | 56 | 8/25/2019 0:06 | 25AUG19:00:30:00 |
F | 47 | 7/12/2019 18:33 | 12JUL19:18:52:00 |
F | 41 | 8/15/2019 0:37 | 15AUG19:00:56:00 |
F | 37 | 7/28/2019 18:52 | 28JUL19:19:15:00 |
F | 37 | 8/8/2019 12:08 | 08AUG19:12:34:00 |
I had to really manipulate the character string in the first date time column in the where clause, but I believe this works:
data have;
length incident_es $20 incident_ei $20;
gender='M'; age=30; incident_es='7/5/2019 1:07'; incident_ei='05JUL19:01:45:00'; output;
gender='M'; age=65; incident_es='7/7/2019 20:37'; incident_ei='07JUL19:21:14:00'; output;
gender='M'; age=25; incident_es='7/20/2019 23:01'; incident_ei='20JUL19:23:24:00'; output;
gender='F'; age=30; incident_es='8/10/2019 15:41'; incident_ei='10AUG19:16:34:00'; output;
gender='M'; age=64; incident_es='9/12/2019 17:50'; incident_ei='12SEP19:18:30:00'; output;
gender='M'; age=55; incident_es='7/29/2019 13:27'; incident_ei='29JUL19:14:10:00'; output;
gender='M'; age=39; incident_es='9/22/2019 0:46'; incident_ei='22SEP19:01:18:00'; output;
gender='F'; age=56; incident_es='9/1/2019 19:35'; incident_ei='01SEP19:20:00:00'; output;
gender='F'; age=56; incident_es='8/25/2019 0:06'; incident_ei='25AUG19:00:30:00'; output;
gender='F'; age=47; incident_es='7/12/2019 18:33'; incident_ei='12JUL19:18:52:00'; output;
gender='F'; age=41; incident_es='8/15/2019 0:37'; incident_ei='15AUG19:00:56:00'; output;
gender='F'; age=37; incident_es='7/28/2019 18:52'; incident_ei='28JUL19:19:15:00'; output;
gender='F'; age=37; incident_es='8/8/2019 12:08'; incident_ei='08AUG19:12:34:00'; output;
run;
proc sql;
select count(*) as count
from have
where abs(input(put(input(substr(incident_es,1,index(incident_es,' ')-1),mmddyy10.),date7.)||':'||substr(incident_es,index(incident_es,' ')+1),datetime.) - input(incident_ei,datetime.))/60/60 < 3;
quit;
I'm sure someone else can figure out a much cleaner way to get SAS to recognize the datetime values in the first column.
I can't do the data manipulation in the dataset as I have 100,000 observations in my dataset. It is impossible to manipulate at the data step.
I will try other options and let you all know!
Thank you,
M
Questions/things to consider:
You can convert your character variable to datetime using the following:
incident_ei_dt = input(incident_ei, anydtdtm.);
@Malathi13 wrote:
Hi All,
I have trouble calculating with SAS Datetime all the time. Please provide me your suggestions on this problem, I really appreciate all for helping many beginners with SAS. I have a dataset with two columns with datetimes and some demographic variables (table below): both incident_es and incident_ei are character variables. I want to calculate number of observations within three hours of incident_es. Can somebody help me with this please?
Thank you,
M
gender age incident_es incident_ei M 30 7/5/2019 1:07 05JUL19:01:45:00 M 65 7/7/2019 20:37 07JUL19:21:14:00 M 25 7/20/2019 23:01 20JUL19:23:24:00 F 30 8/10/2019 15:41 10AUG19:16:34:00 M 64 9/12/2019 17:50 12SEP19:18:30:00 M 55 7/29/2019 13:27 29JUL19:14:10:00 M 39 9/22/2019 0:46 22SEP19:01:18:00 F 56 9/1/2019 19:35 01SEP19:20:00:00 F 56 8/25/2019 0:06 25AUG19:00:30:00 F 47 7/12/2019 18:33 12JUL19:18:52:00 F 41 8/15/2019 0:37 15AUG19:00:56:00 F 37 7/28/2019 18:52 28JUL19:19:15:00 F 37 8/8/2019 12:08 08AUG19:12:34:00
To add to Reeza's comments, I'd suggest reading
for thorough coverage of the concepts.
Hi @Malathi13
A warning: Following @Reeza suggestion and reading the character datetime values with the informat anydtdtm. can give unexpected results. Data seems to be OK, but the conversion gets dates and months wrong in some cases. See what happens to the two datetime strings from your input in the following example:
data have;
length incident_es $22;
incident_es='7/5/2019 1:07'; output;
incident_es='7/20/2019 23:01'; output;
run;
data want; set have;
dt = input(incident_es,anydtdtm.);
put incident_es ' ' @20 dt datetime.;
run;
The result is:
7/5/2019 1:07 07MAY19:01:07:00
7/20/2019 23:01 20JUL19:23:01:00
First observation is read as ddmmyyyy, and the second as mmddyyyy. An error like that is difficult to spot, because all observations have a valid datatime, and you might end up with a wrong analysis based on bad data in your next paper.
Note that the ANYDTxxx informats are not necessarily wrong. But you do need to know how they work since they are going to GUESS how to interpret the strings.
3035 data want; set have; 3036 dt = input(incident_es,anydtdtm.); 3037 put incident_es ' ' @20 dt datetime.; 3038 run; 7/5/2019 1:07 05JUL19:01:07:00 7/20/2019 23:01 20JUL19:23:01:00
In your example the issue is that your current locale setting is one that tells the informats that they should prefer DMY order over MDY order when the values are ambiguous. That is why I tell people to always use DATE or YYMMDD formats for dates. This will eliminate the confusion.
Hi @Tom
I also tell people that they should never depend on SAS guessing anything, like using any-whatever informats or let proc import determine column types.
I wasn't aware of the the consequence of the locale setting for the behaviour of the anydtdte-informat, until I once i found myself in an embarassing situation when sombody in the audience could prove that my presentation was based on garbled data. But it seems to be a problem limited to european SAS users reading ddmmyy-dates.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.