DATA Step, Macro, Functions and more

Tricky Join: Joining Two Data Sets Based On A Date Range With Aggregations

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 101
Accepted Solution

Tricky Join: Joining Two Data Sets Based On A Date Range With Aggregations

Hi all—

I'm trying to join a data set that contains a frequency of ‘Clicks’ on a ResponseDate for specific URLs to a data set which links the URL to the ‘Creative Name’ and ‘drop_date’ its associated with. I need to bring back a data set that sums the clicks by drop_date and URL for each Creative Name’. This is an example of the final product I need:

drop_date

URL 

Creative_Name

Total_Clicks

5/20/2013

Link.com/east/2es

Perf Offer

10

9/30/2013

Link.com/west/2ev

We Want Bonuses

2

3/7/2013

Link.com/north/2gq

Bonuse Power

2

1/31/2013

Link.com/south/2ke

Offer Box

41

8/12/2013

Link.com/south/2ke

Sign Up Now

21

The issue I’m having is if you notice some URLs (Link.com/south/2ke) are used more than once for Creative_Names .  I need sum all the clicks that have a Response_Date on or after the drop_date and not the clicks that happen when the same URL is used again for another Creative_Name. So the Creative_Name ‘Offer Box’  had a drop_date of 1/31/2013 and the link associated with it was ‘Link.com/south/2ke’  and this link was then assigned to a new Creative_Name Sign Up Now  on  8/12/2013.


I have included two sample data sets URL_Responses and Creative_Drops. Any assistance is greatly appreciated!!!


Thanks!!!

data Creative_Drops;

length drop_date 8 URL $20 Creative_Name $30;

INPUT drop_date : anydtdte9.  URL $  Creative_Name $20.;

format drop_date date9.;

datalines;

5/20/2013 Link.com/east/2es Perf Offer

9/30/2013 Link.com/west/2ev We Want Bonuses

3/7/2013 Link.com/north/2gq Bonuse Power

1/31/2013 Link.com/south/2ke Offer Box

8/12/2013 Link.com/south/2ke Sign Up Now

data URL_Responses;

length ResponseDate 8 URL $30 Clicks 4;

INPUT ResponseDate : anydtdte9.  URL $  Clicks;

format ResponseDate date9.;

   datalines;

3/2/13 Link.com/south/2ke 1

4/6/13 Link.com/south/2ke 2

2/23/13 Link.com/south/2ke 32

5/11/13 Link.com/south/2ke 41

5/4/13 Link.com/south/2ke 21

6/22/13 Link.com/south/2ke 2

1/26/13 Link.com/south/2ke 4

2/9/13 Link.com/south/2ke 27

2/2/13 Link.com/south/2ke 2

1/19/13 Link.com/south/2ke 3

6/8/13 Link.com/south/2ke 3

6/15/13 Link.com/south/2ke 1

4/20/13 Link.com/south/2ke 9

3/9/13 Link.com/south/2ke 12

5/18/13 Link.com/south/2ke 19

8/17/13 Link.com/south/2ke 8

9/21/13 Link.com/south/2ke 4

5/25/13 Link.com/south/2ke 1

1/4/14 Link.com/south/2ke 19

9/14/13 Link.com/south/2ke 6

6/1/13 Link.com/south/2ke 4

6/29/13 Link.com/south/2ke 1

8/31/13 Link.com/south/2ke 57

8/3/13 Link.com/south/2ke 2

1/11/14 Link.com/south/2ke 89

1/18/14 Link.com/south/2ke 97

9/7/13 Link.com/south/2ke 53

12/28/13 Link.com/south/2ke 5

10/12/13 Link.com/south/2ke 5

2/1/14 Link.com/south/2ke 2

1/25/14 Link.com/south/2ke 11

2/16/13 Link.com/south/2ke 12

8/24/13 Link.com/south/2ke 51

9/28/13 Link.com/south/2ke 4

12/21/13 Link.com/south/2ke 4

3/30/13 Link.com/north/2gq 2

4/13/13 Link.com/north/2gq 1

5/11/13 Link.com/north/2gq 6

5/18/13 Link.com/north/2gq 2

6/22/13 Link.com/north/2gq 2

3/16/13 Link.com/north/2gq 29

3/9/13 Link.com/north/2gq 3

4/27/13 Link.com/north/2gq 4

6/1/13 Link.com/north/2gq 2

5/25/13 Link.com/north/2gq 2

1/4/14 Link.com/north/2gq 5

11/16/13 Link.com/north/2gq 2

12/7/13 Link.com/north/2gq 2

12/14/13 Link.com/north/2gq 1

1/18/14 Link.com/north/2gq 9

4/6/13 Link.com/north/2gq 1

12/1/12 Link.com/west/2ev 2

12/15/12 Link.com/west/2ev 9

1/12/13 Link.com/west/2ev 1

2/23/13 Link.com/west/2ev 1

3/30/13 Link.com/west/2ev 1

1/5/13 Link.com/west/2ev 3

11/24/12 Link.com/west/2ev 2

5/11/13 Link.com/west/2ev 62

12/29/12 Link.com/west/2ev 10

4/13/13 Link.com/west/2ev 93

5/4/13 Link.com/west/2ev 102

4/27/13 Link.com/west/2ev 184

6/15/13 Link.com/west/2ev 20

12/22/12 Link.com/west/2ev 11

6/1/13 Link.com/west/2ev 15

7/20/13 Link.com/west/2ev 4

3/9/13 Link.com/west/2ev 4

5/4/13 Link.com/west/2ev 2

7/27/13 Link.com/west/2ev 3

4/13/13 Link.com/west/2ev 78

8/24/13 Link.com/west/2ev 13

3/16/13 Link.com/west/2ev 1

8/31/13 Link.com/west/2ev 6

4/27/13 Link.com/west/2ev 5

6/29/13 Link.com/west/2ev 8

5/18/13 Link.com/west/2ev 31

4/20/13 Link.com/west/2ev 420

4/20/13 Link.com/west/2ev 39

9/28/13 Link.com/west/2ev 5

9/28/13 Link.com/west/2ev 5

8/3/13 Link.com/west/2ev 3

12/7/13 Link.com/west/2ev 10

6/22/13 Link.com/west/2ev 12

10/26/13 Link.com/west/2ev 13

7/6/13 Link.com/west/2ev 6

11/2/13 Link.com/west/2ev 1

5/25/13 Link.com/west/2ev 17

9/7/13 Link.com/west/2ev 18

8/10/13 Link.com/west/2ev 16

11/30/13 Link.com/west/2ev 4

10/5/13 Link.com/west/2ev 1

10/19/13 Link.com/west/2ev 34

11/23/13 Link.com/west/2ev 2

1/25/14 Link.com/west/2ev 2

8/24/13 Link.com/west/2ev 1

8/17/13 Link.com/west/2ev 7

9/14/13 Link.com/west/2ev 10

11/16/13 Link.com/west/2ev 3

10/12/13 Link.com/west/2ev 21

9/21/13 Link.com/west/2ev 1

10/5/13 Link.com/west/2ev 2

12/21/13 Link.com/west/2ev 1

2/1/14 Link.com/west/2ev 2

7/14/12 Link.com/west/2ev 2

7/28/12 Link.com/east/2es 10

9/22/12 Link.com/east/2es 1

7/14/12 Link.com/east/2es 23

8/11/12 Link.com/east/2es 1

12/1/12 Link.com/east/2es 9

12/15/12 Link.com/east/2es 278

8/18/12 Link.com/east/2es 2

1/19/13 Link.com/east/2es 15

1/26/13 Link.com/east/2es 13

3/9/13 Link.com/east/2es 3

9/29/12 Link.com/east/2es 2

12/8/12 Link.com/east/2es 4

3/2/13 Link.com/east/2es 8

2/23/13 Link.com/east/2es 17

12/29/12 Link.com/east/2es 78

4/13/13 Link.com/east/2es 2

12/1/12 Link.com/east/2es 1

2/2/13 Link.com/east/2es 13

3/30/13 Link.com/east/2es 1

5/18/13 Link.com/east/2es 2

12/22/12 Link.com/east/2es 64

1/5/13 Link.com/east/2es 64

6/1/13 Link.com/east/2es 9

2/9/13 Link.com/east/2es 6

6/8/13 Link.com/east/2es 5

4/27/13 Link.com/east/2es 1

7/13/13 Link.com/east/2es 2

2/16/13 Link.com/east/2es 2

3/16/13 Link.com/east/2es 3

5/18/13 Link.com/east/2es 3

6/1/13 Link.com/east/2es 2

9/28/13 Link.com/east/2es 1

4/20/13 Link.com/east/2es 8

5/4/13 Link.com/east/2es 4

3/23/13 Link.com/east/2es 6

8/31/13 Link.com/east/2es 1

11/2/13 Link.com/east/2es 2

5/25/13 Link.com/east/2es 38

10/5/13 Link.com/east/2es 2

6/22/13 Link.com/east/2es 9

8/10/13 Link.com/east/2es 1

9/14/13 Link.com/east/2es 3

6/29/13 Link.com/east/2es 3

12/14/13 Link.com/east/2es 3

1/18/14 Link.com/east/2es 4

12/7/13 Link.com/east/2es 3

8/3/13 Link.com/east/2es 4

1/11/14 Link.com/east/2es 4

9/21/13 Link.com/east/2es 4

10/12/13 Link.com/east/2es 1

2/1/14 Link.com/east/2es 4

7/21/12 Link.com/east/2es 20

11/17/12 Link.com/east/2es 4

12/8/12 Link.com/east/2es 348

1/12/13 Link.com/east/2es 35

4/6/13 Link.com/east/2es 2

7/20/13 Link.com/east/2es 3

8/24/13 Link.com/east/2es 1

9/7/13 Link.com/east/2es 1

1/25/14 Link.com/east/2es 1

;


Accepted Solutions
Solution
‎02-13-2014 06:27 PM
Occasional Contributor
Posts: 10

Re: Tricky Join: Joining Two Data Sets Based On A Date Range With Aggregations

Posted in reply to RobertNYC

No I'd still start with null, but initialize it for the first record.  Also I think you need to check against the begin date, not the drop date (since it's non-null for everyone).  So:

BeginDate=.

if BeginDate eq . then BeginDate = '1Jan1980'd;

if BeginDate ne . then BeginDate = Drop_Date;

This should give your first record a 1/1/1980 start, but rely on the for other records.

View solution in original post


All Replies
Occasional Contributor
Posts: 10

Re: Tricky Join: Joining Two Data Sets Based On A Date Range With Aggregations

Posted in reply to RobertNYC

I'd suggest joining first and then aggregating to simplify things.  I think your join should be a left join on URL_responses with a WHERE Creative_Drops.BeginDate < URL_Responses.ResponseDate <= Creative_Drops.DropDate. 

BeginDate would be created with an extra cleaning step.  Loop through the URL and retain to get the begin date of the next record.  So for example, Sign Up Now must have a start date of 1/31/2013 to go with its drop date of 8/12/2013.  Once you have the start and drop for each campaign, it should be straightforward to find which interval your ResponseDate lands in. 

As it currently stands, it's difficult for me to tell when things start and stop so it will probably be difficult for SAS as well.

Frequent Contributor
Posts: 101

Re: Tricky Join: Joining Two Data Sets Based On A Date Range With Aggregations

Hi AndyK. Thank you for your reply. Yes the Start and Stop Dates are the drop_dates. So if you look at the table below  where the creatives both use the same URL Sign Up Now has a Start Date of 12-Aug-13 and its End date is 31-Jan-13 which is the Start Date of Offer Box. I just don't know how to break out the start and stop dates since they are both contained in the same variable.

drop_dateURLCreative_NameResponseDateClicks
12-Aug-13Link.com/south/2keSign Up Now18-Jan-1497
31-Jan-13Link.com/south/2keOffer Box18-Jan-1497
12-Aug-13Link.com/south/2keSign Up Now11-Jan-1489
31-Jan-13Link.com/south/2keOffer Box11-Jan-1489
Occasional Contributor
Posts: 10

Re: Tricky Join: Joining Two Data Sets Based On A Date Range With Aggregations

Posted in reply to RobertNYC

You can try something like this, though you may want to initialize your variable to null at a different level depending on complexity.  I'd suggest commenting out the IF LASTs so you can see the output before you summarize. 

data Creative_Drops_new;

set Creative_Drops;

by URL Creative_Name;

if first.URL then do;

BeginDate=.

if first.Creative_Name then do;

if DropDate ne . then BeginDate=DropDate;

*if last.URL then do;

*if last.Creative_Name then do;

retain BeginDate;

run;

Frequent Contributor
Posts: 101

Re: Tricky Join: Joining Two Data Sets Based On A Date Range With Aggregations

Hi AndyK-

I am getting all null values for DropDate and BeginDate not sure why?

data Creative_Drops_new;

set Creative_Drops;

by URL Creative_Name;

if first.URL then do;

BeginDate=.;

if first.Creative_Name then do;

if DropDate ne . then BeginDate=DropDate;

if last.URL then do;

if last.Creative_Name then do;

retain BeginDate;

end;

end;

end;

end;

run;

Occasional Contributor
Posts: 10

Re: Tricky Join: Joining Two Data Sets Based On A Date Range With Aggregations

Posted in reply to RobertNYC

Sounds like it's treating them as uninitiated variables...I think yours is actually named drop_date so you might need to change that.

Frequent Contributor
Posts: 101

Re: Tricky Join: Joining Two Data Sets Based On A Date Range With Aggregations

Perfect!  Thanks!  Last question:  how would you suggest I incorporate the two programs (yours and the one which sums the clicks  since  drop_date and beginDate will be the same in some situations?

drop_dateURLCreative_NameBeginDate
20-May-13Link.com/east/2esPerf Offer20-May-13
7-Mar-13Link.com/north/2gqBonuse Power7-Mar-13
31-Jan-13Link.com/south/2keOffer Box31-Jan-13
12-Aug-13Link.com/south/2keSign Up Now31-Jan-13
30-Sep-13Link.com/west/2evWe Want Bonuses30-Sep-13

data Creative_Drops_new;

set Creative_Drops;

by URL Creative_Name;

if first.URL then do;

BeginDate=.;

if first.Creative_Name then do;

if Drop_Date ne . then BeginDate=Drop_Date;

if last.URL then do;

if last.Creative_Name then do;

retain BeginDate;

format BeginDate date9.;

end;

end;

end;

end;

run;

proc sql;

   create table analyze as

   select a.*, b.ResponseDate,b.clicks

   from Creative_Drops as a left join URL_Responses as b

      on b.ResponseDate ge a.drop_date and a.url=b.url;

   create table result as

   select drop_date, URL, Creative_Name, sum(clicks) as Total_Clicks

   from analyze

   group by drop_date, URL, Creative_Name

   ;

quit;


       

Occasional Contributor
Posts: 10

Re: Tricky Join: Joining Two Data Sets Based On A Date Range With Aggregations

Posted in reply to RobertNYC

I'd suggest using a dummy start date, like 1/1/1980, to clean that up.  You can run another data step to check when the dates are equal, or even update your current one (initialize to 1/1/1980 and check for that date instead of blank).  That way you can be certain your click lands in the window.

Frequent Contributor
Posts: 101

Re: Tricky Join: Joining Two Data Sets Based On A Date Range With Aggregations

Sorry-- last one, promise are you saying set up like this?

data Creative_Drops_new;

set Creative_Drops;

by URL Creative_Name;

if first.URL then do;

BeginDate='1JAN1980'd;

if first.Creative_Name then do;

if Drop_Date ne . then BeginDate=Drop_Date;

if last.URL then do;

if last.Creative_Name then do;

retain BeginDate;

format BeginDate date9.;

end;

end;

end;

end;

run;

Solution
‎02-13-2014 06:27 PM
Occasional Contributor
Posts: 10

Re: Tricky Join: Joining Two Data Sets Based On A Date Range With Aggregations

Posted in reply to RobertNYC

No I'd still start with null, but initialize it for the first record.  Also I think you need to check against the begin date, not the drop date (since it's non-null for everyone).  So:

BeginDate=.

if BeginDate eq . then BeginDate = '1Jan1980'd;

if BeginDate ne . then BeginDate = Drop_Date;

This should give your first record a 1/1/1980 start, but rely on the for other records.

Super User
Posts: 11,343

Re: Tricky Join: Joining Two Data Sets Based On A Date Range With Aggregations

Posted in reply to RobertNYC

Does this do what you want? I create the intermediate data set analyze so you can verify that the selection is getting the records you intend.

proc sql;

   create table analyze as

   select a.*, b.ResponseDate,b.clicks

   from Creative_Drops as a left join URL_Responses as b

      on b.ResponseDate ge a.drop_date and a.url=b.url;

   create table result as

   select drop_date, URL, Creative_Name, sum(clicks) as Total_Clicks

   from analyze

   group by drop_date, URL, Creative_Name

   ;

quit;

Frequent Contributor
Posts: 101

Re: Tricky Join: Joining Two Data Sets Based On A Date Range With Aggregations

Hi Ballard! Thank you for your reply. I think the above is close.  The issue is I only want to Clicks to sum for each creative during the time range between the drop dates. A given URL can only be assigned to one creative at a given time.  So lets say Creative A and Creative B use the same URL but the can't use it at the same time.  Creative A has a drop date of 1/20/2013 I need to sum all the responses until Creative B is assigned to the URL on drop date 3/20/2013.  So the start date and end date are the drop dates for the creatives.

Its looks like the all of the clicks are being summed regardless of when the url was assigned or unassigned.

drop_dateURLCreative_NameResponseDateClicks
12-Aug-13Link.com/south/2keSign Up Now18-Jan-1497
31-Jan-13Link.com/south/2keOffer Box18-Jan-1497
12-Aug-13Link.com/south/2keSign Up Now11-Jan-1489
31-Jan-13Link.com/south/2keOffer Box11-Jan-1489
🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 11 replies
  • 517 views
  • 6 likes
  • 3 in conversation