Learning SAS? Welcome to the exclusive online community for all SAS learners.

Merge data set A w/ Vars in data set B, link by date & time stamp

Accepted Solution Solved
Reply
New Contributor
Posts: 2
Accepted Solution

Merge data set A w/ Vars in data set B, link by date & time stamp

A conundrum necessary to overcome for a grad school project. I need to link weather conditions to bike sharing data (Chicago Divvy) . In the weather data (dbB) vars such as temp, rain, humidity, etc. is basically observed by the hour (00:01:00 in ten minute intervals) with many additional intervals time stamped based on particular events. The bike share data is time stamped by rent start, and there are any number of events per day, perhaps more than 600% more observations per day than the weather intervals. The weather intervals of time stamps are no less than 144 observations per day and frequently much more.

(smaller versions of the data files is attached)

Two distinct data sets A & B.

A is approx 759,000 obs

B is approx 54,000 obs

I want to append A with variables in B, linking first (and hard) by date & then by time stamp (24hr). The time stamps are not consistent intervals in either data set;

dbA time stamp var is 'start'

dbB time stamp var is 'time'


Here's what I've sketched:

     Where dbA startdate = dbB date

     if dbA start >= dbB time & dbA start < dbB time +1


-- but that won't do the job. I need to make the best match, the closest match between dbA start and dbB time.


One possible approach is to ameliorate the additional time stamp intervals in dbB (Weather) by consolidating the event info into the standard 10 minute intervals (00:01:00) before and after, but I'm still stuck with how to apply the event data to dbA.


Any suggestions?

Data looks like:

Data A

startdate,start

10/03/13,6:00:00

10/03/13,6:01:00

10/03/13,6:02:00

10/03/13,6:03:00

10/03/13,6:04:00

10/03/13,6:05:00

10/03/13,6:06:00

10/03/13,6:06:00

10/03/13,6:08:00

10/03/13,6:11:00

10/03/13,6:11:00

10/03/13,6:45:00

10/03/13,6:45:00

10/03/13,6:46:00

10/03/13,6:46:00

10/03/13,6:46:00

10/03/13,6:46:00

10/03/13,6:47:00

10/03/13,6:47:00

10/03/13,6:47:00

10/03/13,7:59:00

10/03/13,7:59:00

10/03/13,7:59:00

10/03/13,7:59:00

Data B (note inconsistent minute intervals)

date,time

10/03/2013,0:51:00.000

10/03/2013,1:51:00.000

10/03/2013,2:31:00.000

10/03/2013,2:51:00.000

10/03/2013,3:05:00.000

10/03/2013,3:51:00.000

10/03/2013,4:06:00.000

10/03/2013,4:39:00.000

10/03/2013,4:51:00.000

10/03/2013,5:14:00.000

10/03/2013,5:32:00.000

10/03/2013,5:41:00.000

10/03/2013,5:51:00.000

10/03/2013,6:12:00.000

10/03/2013,6:34:00.000

10/03/2013,6:44:00.000

10/03/2013,6:51:00.000

10/03/2013,7:51:00.000

10/03/2013,8:51:00.000

10/03/2013,9:21:00.000

10/03/2013,9:34:00.000

10/03/2013,9:51:00.000

Attachment
Attachment

Accepted Solutions
Solution
‎05-01-2014 04:11 PM
New Contributor
Posts: 2

Re: Append data set A w/ Vars in data set B, link by date & time stamp

Thank you for this tip -- I'd been thinking about how best to create a primary key, something common to both datasets. Using the rounding function on the time stamps to go up or down to the nearest hour increment as a new variable in each set, and then use the merge statement in a data step was what I came up with yesterday after seeing your reply.

In the end I resorted to Proc SQL, and it worked on 97.2% of the observations. Still trying to resolve what went wrong with the other 2.8%. Here was the solution, and many thanks to a classmate who gave me the tip:

PS: Analysis of the 2.8% revealed that a sizable majority of the observations not being linked were actually equal to each other, so I expanded the SQL conditions statement below. The remaining, which is about .8% of the total, are values in the last hour (23:XX:XX) or first hour(00:XX:XX). Still working on that.

Step 1:

data tbl_b1;

set tbl_b;

Prior_time=lag(time);

format prior_time time11.;

run;

Step 2:

proc sql;

create table joined as

Select *

from tbl_a a left join tbl_b1 b on a.startdate=b.date

    and (b.prior_time <a.start<b.time or a.start=b.time);

quit;

View solution in original post


All Replies
Super User
Super User
Posts: 7,955

Re: Append data set A w/ Vars in data set B, link by date & time stamp

Not sure I understand you.  You want to merge rows from table B onto table based on whether they are both within a time window?  If so then you need to:

Assign a base timepoint which applies to both datasets, e.g 10/03/2013 00:00.  This is the base from which further timepoints are derived.

Go through table A and assign the time window, e.g. 10/03/2013,6:00,00     = Day 1 Hour 6

Go through table B and assign time windows e.g. 10/03/2013,0:51:00.000   = Day 1 Hour 1, defined as dates same, time within 0:00:00 and 1:00:00.

Then merge the two based on the time windows you have created.

Solution
‎05-01-2014 04:11 PM
New Contributor
Posts: 2

Re: Append data set A w/ Vars in data set B, link by date & time stamp

Thank you for this tip -- I'd been thinking about how best to create a primary key, something common to both datasets. Using the rounding function on the time stamps to go up or down to the nearest hour increment as a new variable in each set, and then use the merge statement in a data step was what I came up with yesterday after seeing your reply.

In the end I resorted to Proc SQL, and it worked on 97.2% of the observations. Still trying to resolve what went wrong with the other 2.8%. Here was the solution, and many thanks to a classmate who gave me the tip:

PS: Analysis of the 2.8% revealed that a sizable majority of the observations not being linked were actually equal to each other, so I expanded the SQL conditions statement below. The remaining, which is about .8% of the total, are values in the last hour (23:XX:XX) or first hour(00:XX:XX). Still working on that.

Step 1:

data tbl_b1;

set tbl_b;

Prior_time=lag(time);

format prior_time time11.;

run;

Step 2:

proc sql;

create table joined as

Select *

from tbl_a a left join tbl_b1 b on a.startdate=b.date

    and (b.prior_time <a.start<b.time or a.start=b.time);

quit;

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 2 replies
  • 565 views
  • 3 likes
  • 2 in conversation