BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
djohn051
Fluorite | Level 6

I will try to explain this the best that I can.

 

I currently have two databases: Database A and Database B - Both of these databases have the same people in them (id numbers), but the amount of observations is different (longfile). Database A features dates (start and end date) that I would like to keep. These dates are for a timeframe that was worked. Database B features some variables with details about the work (Var_1 Var_2 Var_3) and also features dates (start and end date), but these dates were tentative and the ones in Database A are the ones I want to keep. 

 

What I would like to do, is for each observation in Database A, try to identify the case from Database B with the most similar start and end date (for the same ID number) and then merge the line together, taking the variables about work details from Database B and the start and end date variables from Database A. Database A is to be used as the spine so for example:

 

Database A:

  • ID1 01/01/2000 31/01/2000
  • ID1 25/02/2001 20/10/2001

Database B:

  • ID1 03/01/2000 01/02/2000 Var_1 Var_2 Var_3
  • ID1 25/04/2000 28/05/2000 Var_1 Var_2 Var_3
  • ID1 25/07/2000 28/08/2000 Var_1 Var_2 Var_3
  • ID1 20/02/2001 17/10/2001 Var_1 Var_2 Var_3

Merged Database:

  • Rows 1 and 4 have the same ID# as in Database A and have the most similar dates. Dates from Database A are kept and var_1-var_3 are added. The dataset should look as follows:
    • ID1 01/01/2000 31/01/2000 Var_1 Var_2 Var_3
    • ID1 25/02/2001 20/10/2001 Var_1 Var_2 Var_3
1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

@djohn051

Great that you've provided sample data but please provide such sample data in the future via a data step as done below so we don't have to do this for you.

data a;
  input id $ start :ddmmyy10. end :ddmmyy10.;
  format start end date9.;
datalines;
ID1 01/01/2000 31/01/2000
ID1 25/02/2001 20/10/2001
;
run;

data b;
  input id $ start :ddmmyy10. end :ddmmyy10. (var1 var2 var3) ($);
  format start end date9.;
datalines;
ID1 03/01/2000 01/02/2000 Var_1 Var_2 Var_3
ID1 25/04/2000 28/05/2000 Var_1 Var_2 Var_3
ID1 25/07/2000 28/08/2000 Var_1 Var_2 Var_3
ID1 20/02/2001 17/10/2001 Var_1 Var_2 Var_3
;
run;

As for terminology: These are tables not databases. A database is the container for objects like tables so it's a level higher up.

 

As for your question code as below could do:

proc sql;
  create table want as
    select 
      a.*,
      b.start as r_start format=date9.,
      b.end   as r_end format=date9.,
      b.var1,
      b.var2,
      b.var3
    from 
      a left join b
        on a.id=b.id
      group by a.id, a.start
      having min(abs(a.start-b.start)+abs(a.end-b.end)) = (abs(a.start-b.start)+abs(a.end-b.end))
    ;
quit;

Please note: It is possible that more than one row from table B meets the criterion in the having clause so you might have to come up with additional filtering criteria to always get to a single row match to table B.

View solution in original post

3 REPLIES 3
Patrick
Opal | Level 21

@djohn051

Great that you've provided sample data but please provide such sample data in the future via a data step as done below so we don't have to do this for you.

data a;
  input id $ start :ddmmyy10. end :ddmmyy10.;
  format start end date9.;
datalines;
ID1 01/01/2000 31/01/2000
ID1 25/02/2001 20/10/2001
;
run;

data b;
  input id $ start :ddmmyy10. end :ddmmyy10. (var1 var2 var3) ($);
  format start end date9.;
datalines;
ID1 03/01/2000 01/02/2000 Var_1 Var_2 Var_3
ID1 25/04/2000 28/05/2000 Var_1 Var_2 Var_3
ID1 25/07/2000 28/08/2000 Var_1 Var_2 Var_3
ID1 20/02/2001 17/10/2001 Var_1 Var_2 Var_3
;
run;

As for terminology: These are tables not databases. A database is the container for objects like tables so it's a level higher up.

 

As for your question code as below could do:

proc sql;
  create table want as
    select 
      a.*,
      b.start as r_start format=date9.,
      b.end   as r_end format=date9.,
      b.var1,
      b.var2,
      b.var3
    from 
      a left join b
        on a.id=b.id
      group by a.id, a.start
      having min(abs(a.start-b.start)+abs(a.end-b.end)) = (abs(a.start-b.start)+abs(a.end-b.end))
    ;
quit;

Please note: It is possible that more than one row from table B meets the criterion in the having clause so you might have to come up with additional filtering criteria to always get to a single row match to table B.

PGStats
Opal | Level 21

Added note:

 

@Patrick's minimum distance criteria is equivalent to:


min(sumabs(a.start-b.start, a.end-b.end)) = sumabs(a.start-b.start, a.end-b.end)

 

but it could also be:


min(euclid(a.start-b.start, a.end-b.end)) = euclid(a.start-b.start, a.end-b.end)

 

both of which try to match start and end times. You could also try to match the time interval centers with:

min(abs(a.start + a.end - b.start - b.end)) = abs(a.start + a.end - b.start - b.end)

PG
djohn051
Fluorite | Level 6

First, thank you for your help. For the most part, this has been working, with the exception of what you had mentioned, when two observations (with the same SN) appear in table a, but only once in table b, they will both be included in table want, but both will correspond to the same dates and variables as the one case from table b

 

 

One additional question, in the following line:

 

      having min(abs(a.start-b.start)+abs(a.end-b.end)) = (abs(a.start-b.start)+abs(a.end-b.end))

 what is the purpose of the syntax after the equals (=) sign?

 

Many thanks!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1004 views
  • 3 likes
  • 3 in conversation