Merging many to many by separate date columns

Reply
Occasional Contributor
Posts: 5

Merging many to many by separate date columns

My title is vague and I couldn't figure out how to explain the question properly.  I have two datasets: one is a job history that contains names, job titles and starting dates for the job title, the other is a payroll file that contains names and paydates.  I am hoping to merge the datasets such that I end up with the same rows as the payroll dataset, but with the appropriate job title for the given paydate added in.  Here is a mockup of the data that I have (JobHist and PayRoll) and the data that I want (Merged).  I don't even know if this is a merge technically, so sorry about that.

data JobHist;

input Name $ Job $ Start_Date MMDDYY10.;

format Start_Date MMDDYY10.;

cards;

Alice Knight 12/01/12

Alice Bishop 02/05/15

Alice Queen 03/01/15

Betty Pawn 09/15/14

Betty Knight 12/02/14

Charles Rook 07/02/10

;

run;

data PayRoll;

input Name $ Pay_Date MMDDYY10.;

format Pay_Date MMDDYY10.;

cards;

Alice 12/15/2014

Alice 1/15/2015

Alice 2/12/2015

Alice 3/15/2015

Alice 4/15/2015

Betty 9/20/2014

Betty 10/31/2014

Betty 12/15/2014

Betty 1/3/2015

Charles 1/1/2011

Charles 3/8/2012

;

run;

data Merged;

input Name $ Job $ Pay_Date MMDDYY10.;

format Pay_Date MMDDYY10.;

cards;

Alice Knight 12/15/2014

Alice Knight 1/15/2015

Alice Bishop 2/12/2015

Alice Queen 3/15/2015

Alice Queen 4/15/2015

Betty Pawn 9/20/2014

Betty Pawn 10/31/2014

Betty Knight 12/15/2014

Betty Knight 1/3/2015

Charles Rook 1/1/2011

Charles Rook 3/8/2012

;

run;

The idea here is to add to the payroll dataset the job title that the employee had as of that paydate (so, if an employee had multiple titles during a pay period you would list the most recent/new one).  All of the merges I've tried thus far have been horribly broken.  Thanks in advance for your help!

Respected Advisor
Posts: 3,156

Re: Merging many to many by separate date columns

Not the most efficient, but easy to construct:

proc sql;

  create table merge as

  select *, (select job from jobhist where name=a.name and start_date <= a.pay_date having start_date=max(start_date)) as job

  from payroll a

  ;

quit;

Update: Better data step approach:

data _merge;

set payroll (in=a) jobhist(in=b rename=(start_date = pay_date job=_job));

by name pay_date;

retain job;

if b then job=_job;

if a;

drop _job;

run;

Occasional Contributor
Posts: 5

Re: Merging many to many by separate date columns

Hai.kuo  thank you, this appears to have worked.

I am auditing it now to see if I missed anything vital (real data being what it is there are gaps, blank fields, names that exist in one dataset and not the other, etc).

Ask a Question
Discussion stats
  • 2 replies
  • 204 views
  • 1 like
  • 2 in conversation