I'm thinking proc SQL is the best way to go about this. I have a fact table that has Id and time. The corresponding tables that work with the fact table have less time entries than the fact table. I need to join the other tables with the fact tables on Id and time then fill in blanks that come in from the other tables based on the most recent date up to that point in time. My time dimension is a character string formatted like 201010 202110 202210 formatted as year-academic period. The reason I want to use proc SQL and case statements is because I have about 10 lookup tables like the table below and its easy to use proc SQL with lookup tables.
Please advise on how to fill in the blanks.
Test tables are:
proc sql;
create table fact_example (
academic_period varchar2(6),
id int,
retained int);
insert into fact_example (academic_period, id, retained)
values ('201010', 111, 1);
insert into fact_example (academic_period, id, retained)
values ('201110', 111, 1);
insert into fact_example (academic_period, id, retained)
values ('201210', 111, 1);
insert into fact_example (academic_period, id, retained)
values ('201310', 111, 1);
create table dim_example (
academic_period varchar2(6),
id int,
variable_changes varchar(1));
insert into dim_example (academic_period, id, variable_changes)
values ('201010', 111, 'b');
insert into dim_example (academic_period, id, variable_changes)
values ('201210', 111, 'g');
Create table lookup_example (
varshort varchar(1),
varlong varchar(50));
insert into lookup_example (varshort, varlong) values ('b', 'blue');
insert into lookup_example (varshort, varlong) values ('g', 'green');
quit;
So, in your example, the combination of the three tables would give:
academic id retained varlong _period 201010 111 1 blue 201110 111 1 201210 111 1 green 201310 111 1
and what would the filled-in table contain?
academic id retained varlong _period 201010 111 1 blue
201110 111 1 blue
201210 111 1 green 201310 111 1 green
And you need to do this exclusively with SQL? Do you eventually need to move this SQL code back to some dbms?
For that code block doing it in proc SQL would likely save a lot of code, given 10+ lookup tables. After I have that block of data queried I'm switching back and forth between proc SQL and datasteps for the rest of the ETL.
Do you have a separate dim table for every lookup table? What is the purpose of the dim table?
There lookup tables for the most part. I have one fact table and one dim. Creating the other dims requires top level sign off at the moment.
@DavidPhillips2 wrote:
For that code block doing it in proc SQL would likely save a lot of code, given 10+ lookup tables. After I have that block of data queried I'm switching back and forth between proc SQL and datasteps for the rest of the ETL.
Fact tables are normally high volume, dimension and reference tables often low volume. If using a SQL then you'll end up with a lot of sorting.
Consider using a data step with hash table lookups.
@DavidPhillips2 wrote:
For that code block doing it in proc SQL would likely save a lot of code, given 10+ lookup tables.
Doing that in SQL is akin to shooting yourself in the foot with a double-barreled magnum shotgun. It looks simple (one SQL step), but will in fact cause several sorts of the fact table.
Unless your fact table is so small that the whole operation is not time-consuming at all.
Doing that with 10 hashes in a single data step (that reads the large table sequentially just once) is the way to go, unless the lookups are too large to fit in memory all at once. Then you may have to split the operation into several steps.
(Using the examples from my previous post)
data want;
set fact_example;
if _N_ = 1
then do;
length variable_changes $1;
declare hash dim (dataset:'dim_example');
dim.definekey('academic_period','id');
dim.definedata('variable_changes');
dim.definedone();
length varlong $50;
declare hash lookup (dataset:'lookup_example (rename=(varshort=variable_changes))');
lookup.definekey('variable_changes');
lookup.definedata('varlong');
lookup.definedone();
call missing(variable_changes,varlong);
end;
retain variable_changes;
rc = dim.find();
rc = lookup.find();
drop rc variable_changes;
run;
Actually, it might be best to do step one as a data step selecting the data and then do the lookups in proc sql.
Code like below would work for your sample data.
data want;
set fact_example;
by id academic_period;
if _n_=1 then
do;
if 0 then set dim_example;
dcl hash h1(dataset:'dim_example');
h1.defineKey('academic_period', 'id');
h1.defineData('variable_changes');
h1.defineDone();
drop variable_changes;
if 0 then set lookup_example;
dcl hash h2(dataset:'lookup_example');
h2.defineKey('varshort');
h2.defineData('varlong');
h2.defineDone();
drop varshort;
end;
if h1.find()=0 then h2.find(key:variable_changes);
output;
if last.id then call missing(of _all_);
run;
proc print data=want;
run;
Why are you using SQL for the simple creation of tables? This is THE domain of data steps:
data fact_example;
input academic_period :$6. id retained;
datalines;
201010 111 1
201110 111 1
201210 111 1
201310 111 1
;
data dim_example;
input academic_period :$6. id variable_changes :$1.;
datalines;
201010 111 b
201210 111 g
;
data lookup_example;
input varshort :$1. varlong :$50.;
datalines;
b blue
g green
;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.