Hi There.
I've started using a new DWH at work and they've introduced lookup tabels for practically everything.
I have a large dataset that contains a number (key) that links to a date that SAS is reading as character (theyre using SQL and dont know why its not recognising it as a date)
What I want to do conditionally search the large table by date so for example.
Lookup table has two variables (date is formatted as character)
DateKey TranDate
1 | 2016-01-01
2 | 2016-01-02
3 | 2016-01-02
data set has numerous variables, one being the datekey which is formated as 6.
I want to merge them together but might add a condition such as:-
Where = (Trandate >= '01Jan2016'd);
is this possible? the only way I can do it is to read the entire dataset in then merge it with the lookup table but this obviously takes time due to the size of the table.
Any help greatly appreciated.
To filter records where accountkey is missing is easy, but I need to know which table holds this variable.
Let's say its origin is the B table, then the full code will be:
proc sql;
create table Payments as select
a.*, b.*
from work.DateKey(where=(date>'28Aug2016'd)) as A
left join DWH.vwFactSalaryTransactions(where=(account_key ne .)) as B
on (a.Datekey = b.Datekey)
; quit;
if accountkey is in the lookup table (A) then:
proc sql;
create table Payments as select
a.*, b.*
from work.DateKey(where=(date>'28Aug2016'd and account_key ne .)) as A
left join DWH.vwFactSalaryTransactions as B
on (a.Datekey = b.Datekey)
; quit;
The preferre way of using lookup tables for filtering is by using SQL inner joins. The is the usual technique when dealing with star schemas, and similar.
But since your date in the date lookup table (dimension?) is char, you can't use a SAS date constant, since that is considered as numeric.
You could probably use the same date char format as your lookup table, even if it's not ideal. Is the data stored in SAS data sets?
@LinusH wrote:
The preferre way of using lookup tables for filtering is by using SQL inner joins. The is the usual technique when dealing with star schemas, and similar.
But since your date in the date lookup table (dimension?) is char, you can't use a SAS date constant, since that is considered as numeric.
You could probably use the same date char format as your lookup table, even if it's not ideal. Is the data stored in SAS data sets?
I do have a piece of code that will change the character dates to an actual date format, as the lookup table only consists of about 10000 observations I dont think this would take too long.
If I did this and the date was in mmddyy10. could I then use the SQL inner join?
If so would you able to provide an example?
Really apprecate the help.
A few notes ...
Since your character date is in Y-M-D form, you can compare to it directly:
where = (TranDate >= '2016-01-01')
If you want to compare these character values to the numeric values in a larger data set, that can be done as well. But it's not clear what that other data set contains. Does it contain numeric dates, numeric datetimes, something else entirely? What would it contain that lets it be joined to your lookup table in any way? DATEKEY?
@Astounding wrote:
A few notes ...
Since your character date is in Y-M-D form, you can compare to it directly:
where = (TranDate >= '2016-01-01')
If you want to compare these character values to the numeric values in a larger data set, that can be done as well. But it's not clear what that other data set contains. Does it contain numeric dates, numeric datetimes, something else entirely? What would it contain that lets it be joined to your lookup table in any way? DATEKEY?
Apologies if didnt make it clear.
Ok so both tables have a variable called DateKey which is a number and the lookupdate contains the date ( i've just converted the dates in the lookup to mmddyy10. so the character thing isnt an issue now.
What I would like to do is join the tabkes together by the variable datekey but only bring back obersvations from the non lookuptable where date >= 01Jan16 etc.
The non lookup table has millions of observations so i need to make sure that in the step im joining the tables together I am able to define some conditional criteria.
That make sense?
Here are the pieces that I'm not sure about.
When you say you converted the lookup dates to mmddyy10 format, what does that mean? Do you have an actual SAS date value (numeric) stored there, with a format applied for printing purposes?
When you have a DATE variable in the nonlookup table, what does it contain? A date value on the SAS date scale? A datetime value? A character string?
Hi. Heres the tables i have
Lookup Table... Datekey is a numerical format, date is now in dateformat
DateKey Date
1 01/01/2016
2 02/01/2016
3 03/01/2016
4 04/01/2016
Transaction Table Contains multiple variables however also contains the DateKey
DateKey Var2 Var3 Var4
1
2
3
4
I need to join the two tables by datekey but i'd like to include the statment where the date in the joined table is = 03/01/2016. (or any other condition of my choosing)
I'm not 100% sure if this is possible
Help continues to be appreciated.
I must be having a slow day. Here's the part I still don't understand: "date is now in dateformat"
What is actually contained in the field DATE? A character string? A numeric value on the SAS date scale? A numeric value on some other scale that your data warehouse uses? A numeric value that corresponds to a date-time rather than a date?
Whatever the answer, if you want to compare with date fields in the non-lookup table(s), the same questions will apply to those date fields.
Do I understand right:
if TRANDATE was in a SAS date (numeric) format
then you will use : Where = (Trandate >= '01Jan2016'd);
As TRANDATE is a character variable you can use:
where = (input(trandate,yymmdd10.) >= '01JAN2016'd);
OK, I think I'm starting to understand the issues here. You can't use a standalone WHERE statement, since the date field appears in only one data set. This program would require the date to be in both data sets:
data want;
merge huge_table (in=in1) lookup_table (in=in2);
by datekey;
if in1 and in2;
where TranDate >= '01Mar2016'd;
run;
That would generate an error, because TranDate doesn't appear in HUGE_TABLE.
Instead, use the WHERE clause as a data set modifier:
data want;
merge huge_table (in=in1) lookup_table (in=in2 where=(TranDate >= '01Mar2016'd));
by datekey;
if in1 and in2;
run;
All of this assumes you're actually using MERGE to combine the data ... if you are using SQL it can be done but would look a bit different.
Are we finally headed in the right direction?
try:
proc sql;
create table want as select
a.*, b.* /* change to desired variables */
from lookup_table(where=(date='03JAN2016'd)) as a
left join transaction_table as b
on (a.date_key = b.datekey)
; quit;
Is that what you want ? if not addapt it to your needs.
Ok had a pop at this this morning
proc sql;
create table want as select
a.DateKey, b.Date /* change to desired variables */
from DateKey(where=(date>'03JAN2016'd)) as a
left join Payments as b
on (a.Datekey = b.Datekey)
; quit;
but got this error merssage
898 proc sql;
899 create table want as select
900 a.DateKey, b.Date /* change to desired variables */
901 from DateKey(where=(date>'03JAN2016'd)) as a
902 left join Payments as b
903 on (a.Datekey = b.Datekey)
904 ;
ERROR: Column Date could not be found in the table/view identified with the correlation name B.
904! quit;
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE SQL used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
Ok changed it to this
proc sql;
create table want as select
a.DateKey,Date, b.DateKey, TransactionAmount/* change to desired variables */
from DateKey(where=(date>'28Aug2016'd)) as a
left join Payments as b
on (a.Datekey = b.Datekey)
; quit;
and heres the log.
953 proc sql;
954 create table want as select
955 a.DateKey,Date, b.DateKey, TransactionAmount/* change to desired variables */
956 from DateKey(where=(date>'28Aug2016'd)) as a
957 left join Payments as b
958 on (a.Datekey = b.Datekey)
959 ;
WARNING: Variable DateKey already exists on file WORK.WANT.
NOTE: Table WORK.WANT created, with 223 rows and 3 columns.
959! quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 0.02 seconds
cpu time 0.03 seconds
However while it has merged the transaction amount variable where the dates are > 28/08/2016 it has still brought back the other variables from the lookup table (i.e where the date is < 28/08/2016 its juts put the transaction amount as blank.
All i want to see is where the date is > 28/08/2016, think we're nearly there though
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.