BookmarkSubscribeRSS Feed
Malathi13
Obsidian | Level 7

Hi All,

I have trouble calculating with SAS Datetime all the time. Please provide me your suggestions on this problem, I really appreciate all for helping many beginners with SAS. I have a dataset with two columns with datetimes and some demographic variables (table below): both incident_es and incident_ei are character variables. I want to calculate number of observations within three hours of incident_es. Can somebody help me with this please?

 

Thank you,

M

 

 

genderageincident_esincident_ei
M307/5/2019 1:0705JUL19:01:45:00
M657/7/2019 20:3707JUL19:21:14:00
M257/20/2019 23:0120JUL19:23:24:00
F308/10/2019 15:4110AUG19:16:34:00
M649/12/2019 17:5012SEP19:18:30:00
M557/29/2019 13:2729JUL19:14:10:00
M399/22/2019 0:4622SEP19:01:18:00
F569/1/2019 19:3501SEP19:20:00:00
F568/25/2019 0:0625AUG19:00:30:00
F477/12/2019 18:3312JUL19:18:52:00
F418/15/2019 0:3715AUG19:00:56:00
F377/28/2019 18:5228JUL19:19:15:00
F378/8/2019 12:0808AUG19:12:34:00
9 REPLIES 9
JeffMaggio
Obsidian | Level 7

I had to really manipulate the character string in the first date time column in the where clause, but I believe this works:

data have;
length incident_es $20 incident_ei $20;
gender='M'; age=30;	incident_es='7/5/2019 1:07'; incident_ei='05JUL19:01:45:00'; output;
gender='M'; age=65;	incident_es='7/7/2019 20:37';	incident_ei='07JUL19:21:14:00'; output;
gender='M'; age=25;	incident_es='7/20/2019 23:01';	incident_ei='20JUL19:23:24:00'; output;
gender='F'; age=30;	incident_es='8/10/2019 15:41';	incident_ei='10AUG19:16:34:00'; output;
gender='M'; age=64;	incident_es='9/12/2019 17:50';	incident_ei='12SEP19:18:30:00'; output;
gender='M'; age=55;	incident_es='7/29/2019 13:27';	incident_ei='29JUL19:14:10:00'; output; 
gender='M';	age=39;	incident_es='9/22/2019 0:46';	incident_ei='22SEP19:01:18:00'; output;
gender='F';	age=56;	incident_es='9/1/2019 19:35';	incident_ei='01SEP19:20:00:00'; output;
gender='F';	age=56;	incident_es='8/25/2019 0:06';	incident_ei='25AUG19:00:30:00'; output;
gender='F';	age=47;	incident_es='7/12/2019 18:33';	incident_ei='12JUL19:18:52:00'; output;
gender='F';	age=41;	incident_es='8/15/2019 0:37';	incident_ei='15AUG19:00:56:00'; output;
gender='F';	age=37;	incident_es='7/28/2019 18:52';	incident_ei='28JUL19:19:15:00'; output;
gender='F';	age=37;	incident_es='8/8/2019 12:08';	incident_ei='08AUG19:12:34:00'; output;


run;

proc sql;
select count(*) as count 
from have
where abs(input(put(input(substr(incident_es,1,index(incident_es,' ')-1),mmddyy10.),date7.)||':'||substr(incident_es,index(incident_es,' ')+1),datetime.) - input(incident_ei,datetime.))/60/60 < 3;
quit;

I'm sure someone else can figure out a much cleaner way to get SAS to recognize the datetime values in the first column. 

Malathi13
Obsidian | Level 7

Hi @JeffMaggio, 

I can't do the data manipulation in the dataset as I have 100,000 observations in my dataset. It is impossible to manipulate at the data step.

 

I will try other options and let you all know!

 

Thank you,

M

Reeza
Super User
Why cannot you not do the data manipulation? The code is the same regardless if it's 10 or 100,000 rows. The 'have' step is to create raw data from your data set but you can just use your actual data, not the HAVE data set.
Reeza
Super User

Questions/things to consider:

 

  • There are no dates within 3 hours of each other in your data set based on the incident_ES variable, there are no dates on the same day at all. 
  • Is the rule within three hours of incident_es or is it within three hours of incident_ei? 
  • Is the incident_ei variable relevant?
  • Is it within three hours in either direction or just three hours after or before?

You can convert your character variable to datetime using the following:

 

incident_ei_dt = input(incident_ei, anydtdtm.);

@Malathi13 wrote:

Hi All,

I have trouble calculating with SAS Datetime all the time. Please provide me your suggestions on this problem, I really appreciate all for helping many beginners with SAS. I have a dataset with two columns with datetimes and some demographic variables (table below): both incident_es and incident_ei are character variables. I want to calculate number of observations within three hours of incident_es. Can somebody help me with this please?

 

Thank you,

M

 

 

gender age incident_es incident_ei
M 30 7/5/2019 1:07 05JUL19:01:45:00
M 65 7/7/2019 20:37 07JUL19:21:14:00
M 25 7/20/2019 23:01 20JUL19:23:24:00
F 30 8/10/2019 15:41 10AUG19:16:34:00
M 64 9/12/2019 17:50 12SEP19:18:30:00
M 55 7/29/2019 13:27 29JUL19:14:10:00
M 39 9/22/2019 0:46 22SEP19:01:18:00
F 56 9/1/2019 19:35 01SEP19:20:00:00
F 56 8/25/2019 0:06 25AUG19:00:30:00
F 47 7/12/2019 18:33 12JUL19:18:52:00
F 41 8/15/2019 0:37 15AUG19:00:56:00
F 37 7/28/2019 18:52 28JUL19:19:15:00
F 37 8/8/2019 12:08 08AUG19:12:34:00

 

Doc_Duke
Rhodochrosite | Level 12

To add to Reeza's comments, I'd suggest reading

https://documentation.sas.com/?docsetId=lrcon&docsetTarget=n0q9ylcaccjgjrn19hvqnd9cte8p.htm&docsetVe...

for thorough coverage of the concepts.

ErikLund_Jensen
Rhodochrosite | Level 12

Hi @Malathi13 

 

A warning: Following @Reeza suggestion and reading the character datetime values with the informat anydtdtm. can give unexpected results. Data seems to be OK, but the conversion gets dates and months wrong in some cases. See what happens to the two datetime strings from your input in the following example:

 

data have;
	length incident_es $22;
	incident_es='7/5/2019 1:07'; output;
	incident_es='7/20/2019 23:01'; output;
run;

data want; set have;
	dt = input(incident_es,anydtdtm.);
	put incident_es ' ' @20 dt datetime.;
run;

The result is:

7/5/2019 1:07      07MAY19:01:07:00
7/20/2019 23:01  20JUL19:23:01:00

 

First observation is read as ddmmyyyy, and the second as mmddyyyy. An error like that is difficult to spot, because all observations have a valid datatime, and you might end up with a wrong analysis based on bad data in your next paper.

Tom
Super User Tom
Super User

Note that the ANYDTxxx informats are not necessarily wrong. But you do need to know how they work since they are going to GUESS how to interpret the strings.  

3035  data want; set have;
3036  	dt = input(incident_es,anydtdtm.);
3037  	put incident_es ' ' @20 dt datetime.;
3038  run;

7/5/2019 1:07      05JUL19:01:07:00
7/20/2019 23:01    20JUL19:23:01:00

In your example the issue is that your current locale setting is one that tells the informats that they should prefer DMY order over MDY order when the values are ambiguous. That is why I tell people to always use DATE or YYMMDD formats for dates.  This will eliminate the confusion.

ErikLund_Jensen
Rhodochrosite | Level 12

Hi @Tom 

 

I also tell people that they should never depend on SAS guessing anything, like using any-whatever informats or let proc import determine column types.

 

I wasn't aware of the the consequence of the locale setting for the behaviour of the anydtdte-informat, until I once i found myself  in an embarassing situation  when sombody in the audience could prove that my presentation was based on garbled data. But it seems to be a problem limited to european SAS users reading ddmmyy-dates.

Malathi13
Obsidian | Level 7
Hi All,
thank you so much for your wonderful suggestions. I was looking for something within three hours of incident_es. I will try your suggestions and let you know. Sorry for the late reply and also I tried anydttm in the input function, didn't work.

Thank you,
Malathi13

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 1540 views
  • 0 likes
  • 6 in conversation