BookmarkSubscribeRSS Feed
Filipvdr
Pyrite | Level 9

Hello guys..

 

I'm joining two timesliced tables but i'm struggling to get the correct output

 

Table A:

ID            NAME            FKEY           FROM          TO                PRODUCT_DESC

1              BASKET        2                  1/1/2010        1/1/2015      BASKETBALL

1              BASKET        2                  2/1/2015        31/12/9999  BASKETBALL SPALDING

2              PRODUCT2  3                  1/1/2010        31/12/9999  TEMP

 

TABLE B (FKEY is ID here)

FKEY       FROM             TO               PRODUCT_CAT_DESCR

2              1/1/2010         31/12/2013   SPORTS EQUIPMENT

2              1/1/2014         31/12/9999   BASKETBALL EQUIPMENT

 

I'm joining on the FKEY.

As a result I would like to have 4 records for ID 1 with 4 different from/to's.

 

Any idea's?

5 REPLIES 5
ballardw
Super User

@Filipvdr wrote:

Hello guys..

 

I'm joining two timesliced tables but i'm struggling to get the correct output

 

Table A:

ID            NAME            FKEY           FROM          TO                PRODUCT_DESC

1              BASKET        2                  1/1/2010        1/1/2015      BASKETBALL

1              BASKET        2                  2/1/2015        31/12/9999  BASKETBALL SPALDING

2              PRODUCT2  3                  1/1/2010        31/12/9999  TEMP

 

TABLE B (FKEY is ID here)

FKEY       FROM             TO               PRODUCT_CAT_DESCR

2              1/1/2010         31/12/2013   SPORTS EQUIPMENT

2              1/1/2014         31/12/9999   BASKETBALL EQUIPMENT

 

I'm joining on the FKEY.

As a result I would like to have 4 records for ID 1 with 4 different from/to's.

 

Any idea's?


Can you show the code you attempted?

And what you actually expect the output to look like for that example data? It is not at all clear what values might be needed in the result for NAME, PRODUCT_DESC and  PRODUCT_CAT_DESCR in the results.

 

And verify that the From and To are actually SAS date values and not character?

ErikLund_Jensen
Rhodochrosite | Level 12

Hi @Filipvdr 

 

First - Please supply test data as code to create the data sets, so we know what we are dealing with and doesn't have to bother with actual values of date variables etc. and can work on a solution instead.

 

Second - Will any given ID in table A have more than one FKEY, and/or more than one NAME, and could there be overlapping intervals for any given ID?

 

I have an effecient macro to do the trick, but I would like to test it on your data before I post it, so I can get the macro options right according to your answers. Problem is the macro documentation is written in danish, so it will probably need some explanation too. I have some spare time this weekend....

Filipvdr
Pyrite | Level 9
data tableA;
format from to date9. name PRODUCT_DESC $20.;
id = 1;
name = "BASKET";
FKEY = 2;
FROM = "01JAN2010"d;
TO = "01JAN2015"d;
PRODUCT_DESC = "BASKETBALL";
output;
id = 1;
name = "BASKET";
FKEY = 2;
FROM = "02JAN2015"d;
TO = "31DEC2099"d;
PRODUCT_DESC = "BASKETBALL SPALDING";
output;
id = 2;
name = "PRODUCT2";
FKEY = 3;
FROM = "01JAN2010"d;
TO = "31DEC2099"d;
PRODUCT_DESC = "TEMP";
output;
run;

data tableB;
format fromb tob date9.  PRODUCT_CAT_DESCR $20.;
fkey = 2;
fromb = "01JAN2010"d;
tob = "31DEC2013"d;
PRODUCT_CAT_DESCR = "SPORTS EQUIPMENT";
output;
fkey = 2;
fromb = "01JAN2014"d;
tob = "31DEC2099"d;
PRODUCT_CAT_DESCR = "BASKETBALL EQUIPMENT";
output;
run;

proc sql;
 create table records as select * from tablea,tableb where tablea.fkey=tableb.fkey
AND	(	tableb.fromb	BETWEEN	tablea.from AND tablea.to	
AND		tableb.tob	BETWEEN	tablea.from AND tablea.to	); 
quit;

Here is some code to generate the data. My last step to create a table records is trying to link both tables together. The output is only one row while my wanted output should be 4 rows, each for one timeslice which is available in the data.

 

result.PNG

 

so the product basket now should have 3 rows, one for each timeslice

 

 

 

ErikLund_Jensen
Rhodochrosite | Level 12

Hi @Filipvdr 

 

I had a look at Jim Moon's paper, and I notice that it relies heavily on the between-operator. It is a good approch to interval-joining small tables, but it is extremely slow when working with large tables, especially DBMS-tables.

 

The macro I mentioned in my previous post (attached) is developed to overcome that and work fast with big tables. We use it in many production jobs, where interval joins with between used to run up to 2 hours, and now run in as many minutes instead.

 

Given your test input, this is the call:

 

%fletrensinterval(
	tableA, 
	tableB, 
	tableOUT, 
	unita=FKEY, datefirsta=FROM, datelasta=TO, 
	unitb=FKEY, datefirstb=fromb, datelastb=tob, 
	join=left, 
	byvarA=ID);

TableA and TableB are the input tables, and tableOUT the resulting output tables.

 

unita, datefirsta and datelasta are the variables holding the key to join on and the from- and to-dates i TableA.

unitb, datefirstb and datelastb are the variables holding the key to join on and the from- and to-dates i TableB.

 

Join=left preserves all keys/intervals in TableA, even if the key is not present in TableB, or (part of) an interval in TableA  is outside corresponding intervals in TableB 

 

ByvarA=ID is the name of a grouping variable in TableA (in this case ID), so separate joins are made for each ID value i A, if the join key (in this case FKEY) exists with more than one ID in TableA

 

Result:

 

 

result.gif

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 650 views
  • 0 likes
  • 3 in conversation