BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
iggles
Fluorite | Level 6

My first post... please be kind 😊

I am using SAS Studio.

I have a base table of date, time, and coin ticker, as follows:

data BaseTXs;
infile datalines delimiter=','; 
input 
   date :yymmdd10.
   time :time.
   coin $;
format
   date yymmdd10.
   time timeampm.;

datalines;
2020-09-01,9:29:55 PM,BTC
2020-09-01,9:58:00 PM,BTC
2020-09-08,10:30:42 AM,BTC
2020-09-09,4:20:00 PM,BTC
;

To my base table, I simply want to append the price of bitcoin using my PriceLookup table, picking the price that is closest to my date and time.  Here is my example PriceLookup table:

data PriceLookup;
infile datalines delimiter=',' truncover;
input
   price
   date :yymmdd10.
   time :time.
   coin $;
format
   price 8.2
   date yymmdd10.
   time timeampm.;
   
datalines;
351.05,2020-08-31,11:59:59 PM,BTC
400.45,2020-09-01,8:16:41 AM,BTC
425.02,2020-09-01,9:18:45 PM,BTC
378.05,2020-09-01,10:13:47 PM,BTC
755.66,2020-09-07,6:23:47 PM,BTC
789.99,2020-09-08,9:25:00 AM,BTC
804.33,2020-09-08,1:55:54 PM,BTC
1013.82,2020-09-09,3:05:00 AM,BTC
1020.10,2020-09-09,6:00:00 PM,BTC
1015.05,2020-09-10,8:00:00 PM,BTC
999.65,2020-09-10,8:15:05 PM,BTC
;

So my final output table should look like this:

data BaseTXs_fin;
infile datalines delimiter=','; 
input 
   date :yymmdd10.
   time :time.
   coin $
   price;
format
   date yymmdd10.
   time timeampm.
   price 8.2;

datalines;
2020-09-01,9:29:55 PM,BTC,425.02
2020-09-01,9:58:00 PM,BTC,378.05
2020-09-08,10:30:42 AM,BTC,789.99
2020-09-09,4:20:00 PM,BTC,1020.10
;

I have seen several posts/articles showing how this can be done in one PROC SQL step, but I can't seem to alter the code to fit my needs (I am mostly unsure how to setup the WHERE, GROUP BY, and HAVING statements).  Here, for example:

https://www.lexjansen.com/pharmasug/2003/CodersCorner/cc001.pdf 

 

Any help would be much appreciated!

 

(My request is actually a bit simplified, as my production base table consists of more coins than just bitcoin, and each different coin has its own separate PriceLookup table.  So I'd need to do a merge for each coin.  I think I can do this on my own if I can just get the simplified version to work, but if you're up for providing the extra help, I certainly won't complain!)

1 ACCEPTED SOLUTION

Accepted Solutions
ChrisNZ
Tourmaline | Level 20

Keeping sorted data is one of SAS's great features, so you should take advantage of that.

This takes 20 seconds on my small Windows server with 500 and 200,000,000 observations.

data _V/view=_V;             %* Save data for one date and next date on one line;
  if ^LASTOBS then do;
    set PRICELOOKUP (firstobs=2);
    DT2=DATE*3600*24+TIME; PRICE2=PRICE;
  end;
  set PRICELOOKUP end=LASTOBS;
  DT1=DATE*3600*24+TIME; PRICE1=PRICE;
  format DT1 DT2 datetime.;
run;

data WANT;                   
  set BASETXS _V;            %* Interleave prices and transactions;    
  by DATE TIME;                          
  if PRICE1 then do;         % Previous date data also contain next date price, retain all; 
    D1=DT1; P1=PRICE1; D2=DT2; P2=PRICE2;   retain D1 D2 P1 P2;
  end;
  else do;                   %* Transaction found, use previous record values and keep closest;
    DIFF1=abs(DATE*3600*24 + TIME - D1);
    DIFF2=abs(DATE*3600*24 + TIME - D2);
    PRICE=ifn(DIFF1<DIFF2, P1, P2);
    output;
  end;
  keep DATE TIME COIN PRICE;
run;

 

.

View solution in original post

8 REPLIES 8
ChrisNZ
Tourmaline | Level 20

Congratulations on a perfect first post! 

Well presented, well formatted, clear, comprehensive and concise.

And this coming from the grumpier one around here... 🙂

For small volumes, this will do:

proc sql; 
  select b.*, l.PRICE, abs( l.DATE*3600*24 + l.TIME - b.DATE*3600*24 - b.TIME ) as DIFF 
  from BASETXS b, PRICELOOKUP l
  group by b.DATE, b.TIME
  having DIFF=min(DIFF);

date time coin price DIFF
2020-09-01 9:29:55 PM BTC 425.02 670
2020-09-01 9:58:00 PM BTC 378.05 947
2020-09-08 10:30:42 AM BTC 789.99 3942
2020-09-09 4:20:00 PM BTC 1020.10 6000

 

This creates a Cartesian product however, so speed will plummet as volumes grow.

If the volumes are too large, other more complex strategies may be necessary.

 

 

iggles
Fluorite | Level 6

Thank you for the kind words and swift reply, ChrisNZ!  Your code suggestion worked perfectly on my sample tables (as you already knew), but it failed on my huge production tables (as you also predicted).  I get ERROR: Sort execution failure.  I looked it up on support.sas.com and thought maybe moving my work library to a bigger hard drive would fix that, but that process looks a little too technical.

 

My base table is small enough I think, with around 1200 records and a size of 320kb.  But my smallest PriceLookup table is about 500,000 records, and the largest is about 21,000,000 records.  Doing a Cartesian product is probably indeed bogging it down.

 

Note: All tables are already sorted by date, time... and all PriceLookup tables have the same layout.  Not sure if that's helpful info or not.

 

I was trying to step through a logical solution on my own:

  1. Maybe use a datastep to grab the date/time of my Base Table, starting with the first record, and put those values into a macro var.
  2. Then use a datastep to loop through the PriceLookup table, and compare the macro var date/time values to the PriceLookup date/time values, using an abs function to get the difference.  I could RETAIN that difference, go to the next record and compare again.
  3. Because PriceLookup is sorted by Date/Time, I know that if the current record difference is smaller than the previous record's retained difference, then I need to go to the next record and compare again.  But if the current difference is larger than the previous difference, I can output the previous record as my closest match.
  4. Then go back to the Base Table, get the 2nd record's date/time, and again compare to the PriceLookup table until a closest match is found.  Rinse and repeat until all the records in the Base Table have been compared.

This sounds like I would need to nest my datasteps though, which I dont think I can do.

 

Any code or suggestions would be much appreciated! 

 

FreelanceReinh
Jade | Level 19

Hello @iggles and welcome to the SAS Support Communities!

 

Interesting task. I would suggest that you create numeric informats from CNTLIN= datasets based on the PriceLookup tables (e.g., one per coin or one per coin and year, depending on the data volumes). The start and end points of the informat ranges would be the mean values of two consecutive datetimes in the PriceLookup table. (The first and last range could use HLO='L' and 'H', respectively.) The prices would serve as the labels. The lookup would then be a simple assignment statement using the INPUTN function.

 

Here's an example using your sample datasets, creating one informat per coin, using the name of the coin as the informat name (assuming that those are valid informat names).

data infmts(keep=fmtname--start);
do until(last.coin);
  set pricelookup;
  by coin;
  dt=dhms(date,0,0,time);
  lag_dt=lag(dt);
  lag_price=lag(price);
  if first.coin then do;
    fmtname=coin;
    type='I';
    hlo='L';
  end;
  else do;
    end=mean(dt, lag_dt);
    label=lag_price;
    output;
    hlo='I';
    start=end;
  end;
  if last.coin then do;
    hlo='H';
    label=price;
    output;
  end;
end;
format start end datetime21.1;
run;

proc format cntlin=infmts;
run;

data want;
set BaseTXs;
price=inputn(put(dhms(date,0,0,time),12.),coin);
format price 8.2;
run;
iggles
Fluorite | Level 6

Thanks for giving it a go, @FreelanceReinh !  

 

I went with ChrisNZ's suggestion and that seemed to work well enough for me.  Although I did not try your code yet, I can still use it as a learning tool for future reference.  Thanks again!

ChrisNZ
Tourmaline | Level 20

Keeping sorted data is one of SAS's great features, so you should take advantage of that.

This takes 20 seconds on my small Windows server with 500 and 200,000,000 observations.

data _V/view=_V;             %* Save data for one date and next date on one line;
  if ^LASTOBS then do;
    set PRICELOOKUP (firstobs=2);
    DT2=DATE*3600*24+TIME; PRICE2=PRICE;
  end;
  set PRICELOOKUP end=LASTOBS;
  DT1=DATE*3600*24+TIME; PRICE1=PRICE;
  format DT1 DT2 datetime.;
run;

data WANT;                   
  set BASETXS _V;            %* Interleave prices and transactions;    
  by DATE TIME;                          
  if PRICE1 then do;         % Previous date data also contain next date price, retain all; 
    D1=DT1; P1=PRICE1; D2=DT2; P2=PRICE2;   retain D1 D2 P1 P2;
  end;
  else do;                   %* Transaction found, use previous record values and keep closest;
    DIFF1=abs(DATE*3600*24 + TIME - D1);
    DIFF2=abs(DATE*3600*24 + TIME - D2);
    PRICE=ifn(DIFF1<DIFF2, P1, P2);
    output;
  end;
  keep DATE TIME COIN PRICE;
run;

 

.

iggles
Fluorite | Level 6

Thanks @ChrisNZ !

 

This gets me close enough such that I think I can get it from here.  My production BaseTX table has several different coins in the coin column, and each coin has its own PriceHistory table, so I'll just need to loop through each coin's PriceHistory table.  For anyone else reading, the solution code that ChrisNZ provided assumes only 1 type of coin and 1 PriceHistory table, but it works like a charm to grab the best match based on date/time.

 

BTW Chris, I assume from the fern avatar and 'NZ' in your name that you are from New Zealand?  I was there for the 2019 International Hobbit Day celebration (I'm from the USA).  I actually rented a campervan for 2 months and drove around both islands, taking selfies where I could dressed as a hobbit.  That itself was a lot of fun, but the jaw-dropping scenery is what makes me want to go back.  Truly an amazing country!  Take care, and thanks again for the help.

ChrisNZ
Tourmaline | Level 20

2 months! Nice. Very glad you enjoyed your time here. We're lucky we have a sensible govt and are covid free atm, so it's a good place to be.

ChrisNZ
Tourmaline | Level 20

For more coins, and provided the volumes are reasonable and the lookup tables homogeneous in structure, this should work:

proc sql; 
  select b.*
       , l.PRICE
       , abs(l.DATE*3600*24+l.TIME -b.DATE*3600*24-b.TIME) as DIFF
  from BASETXS b
     , (select * from PRICELOOKUP1
        union
        select * from PRICELOOKUP2 )  l
  where b.COIN = l.COIN
  group by b.DATE, b.TIME, b.COIN
  having DIFF=min(DIFF);

 

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1787 views
  • 6 likes
  • 3 in conversation