Hi folks,
Can someone help me with this please?
I have two tables with dates in each and need to compare the two dates, when comparing I also need to count where the difference between the two dates is within 30 seconds of each another.
The two tables have tens of tousands of dates.
Any ideas how to do this please?
Thanks,
Zubair
Zubair,
First, I assume you mean date-time variables, not dates.
The direct way is with SQL. However, it requires multiple passes of the data.
PROC SQL; * untested code;
SELECT count(*)
FROM a, b
WHERE a.datetime BETWEEN b.datetime - 30 AND b.datetime+30;
QUIT;
RUN;
This would give a count of the number of a.datetime's that are within 30 seconds of any b.datetime.
Somehow, I don't really think that is what your question is. Perhaps you could post a sample of the data you have and the results you want.
Hi Doc,
Thanks for responding!
Yes, I'm comparing date time variables and not dates.
Please see attached a sample of the data I'm working with.
I'd like to compare Date Modified (DM) to Event Date (ED) and check if the difference in seconds between the two is greater than 30. At the same time I need to count by Country code.
So, for example, the result for DM obs1 (30/09/2012 09:45:10) would be UK = 1 and US = 1.
The step would then compare DM obs 2 with all the records in ED.
Obs Date_Modified
1 30/09/2012 09:45:10
2 30/09/2012 09:46:15
3 30/09/2012 08:58:20
4 30/09/2012 09:03:25
5 30/09/2012 09:15:30
Obs Event_date Country
1 30/9/2012 09:45:32 UK
2 30/9/2012 09:51:00 US
3 30/9/2012 09:45:27 US
4 30/9/2012 09:56:00 UK
5 29/9/2012 09:45:15 US
Does this help?
Thanks
Zubair
Well.. Following is one way.. .
add one rownum column to each dataset temporarily..
merge those by rownum..
keep only required columns
if (abs(date1-date2)>=30) then output;
Hi all,
Doc is right.SQL may require multiple steps.
And this is to be taken in account when you are dealing with ten of thousands of observations and want to compare all observations from one dataset with all observations from another dataset. This is what we call a cartesian product. "Cartesian Product, you must be out of your mind", some may say.
Still, I like the SQL approach.
Let's say A is your first dataset and B the second. I would adapt Doc's code like this
PROC SQL; * untested code;
CREATE TABLE step1 AS
SELECT A.Date_Modified, B.Country,
SUM(
CASE
WHEN A.Date_Modified BETWEEN B.Event_date-'0:0:30't AND B.Event_date+'0:0:30't then 1
ELSE 0
) AS Nb_Selected
FROM A, B
QUIT;
This would give a table like this
Date_Modified Country Nb_Selected
30/09/2012 09:45:10 UK 1
30/09/2012 09:45:10 US 1
You may want to test this code with a sample, lets'say 1000 obs from each table with the PROC SQL INOBS options :
PROC SQL INOBS=1000;
If the intermediary result is not necessary, you may want to CREATE VIEW step1 instead of CREATE TABLE step1
Next, you'll have to transpose the intermediary result
PROC TRANSPOSE DATA=step1 OUT=result NAME=Country;
VAR Nb_Selected;
RUN;
to get what you want
Date_Modified UK US
30/09/2012 09:45:10 1 1
HTH
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.