BookmarkSubscribeRSS Feed
DavidPhillips2
Rhodochrosite | Level 12

I'm thinking proc SQL is the best way to go about this.  I have a fact table that has Id and time.  The corresponding tables that work with the fact table have less time entries than the fact table.  I need to join the other tables with the fact tables on Id and time then fill in blanks that come in from the other tables based on the most recent date up to that point in time.  My time dimension is a character string formatted like 201010 202110 202210 formatted as year-academic period.  The reason I want to use proc SQL and case statements is because I have about 10 lookup tables like the table below and its easy to use proc SQL with lookup tables.

 

Please advise on how to fill in the blanks.

 

Test tables are:

 


proc sql;
	create table fact_example (
	academic_period varchar2(6),
	id int,
	retained int);

	insert into fact_example (academic_period, id, retained)
	values ('201010', 111, 1);
	insert into fact_example (academic_period, id, retained)
	values ('201110', 111, 1);
	insert into fact_example (academic_period, id, retained)
	values ('201210', 111, 1);
	insert into fact_example (academic_period, id, retained)
	values ('201310', 111, 1);

	create table dim_example (
	academic_period varchar2(6),
	id int,
	variable_changes varchar(1));

	insert into dim_example (academic_period, id, variable_changes)
	values ('201010', 111, 'b');

	insert into dim_example (academic_period, id, variable_changes)
	values ('201210', 111, 'g');


	Create table lookup_example (
	varshort varchar(1),
	varlong varchar(50));

	insert into lookup_example (varshort, varlong) values ('b', 'blue');
	insert into lookup_example (varshort, varlong) values ('g', 'green');
quit;
11 REPLIES 11
PGStats
Opal | Level 21

So, in your example, the combination of the three tables would give:

 

academic  id  retained    varlong
_period
201010    111               1    blue
201110    111               1    
201210    111               1    green
201310    111               1    

and what would the filled-in table contain?

PG
DavidPhillips2
Rhodochrosite | Level 12
academic  id 	retained  varlong
_period
201010   111 1 blue
201110 111 1 blue
201210 111 1 green 201310 111 1 green
PGStats
Opal | Level 21

And you need to do this exclusively with SQL? Do you eventually need to move this SQL code back to some dbms?

PG
DavidPhillips2
Rhodochrosite | Level 12

For that code block doing it in proc SQL would likely save a lot of code, given 10+ lookup tables.  After I have that block of data queried I'm switching back and forth between proc SQL and datasteps for the rest of the ETL.

PGStats
Opal | Level 21

Do you have a separate dim table for every lookup table? What is the purpose of the dim table?

PG
DavidPhillips2
Rhodochrosite | Level 12

There lookup tables for the most part.  I have one fact table and one dim.  Creating the other dims requires top level sign off at the moment.

Patrick
Opal | Level 21

@DavidPhillips2 wrote:

For that code block doing it in proc SQL would likely save a lot of code, given 10+ lookup tables.  After I have that block of data queried I'm switching back and forth between proc SQL and datasteps for the rest of the ETL.


Fact tables are normally high volume, dimension and reference tables often low volume. If using a SQL then you'll end up with a lot of sorting.

Consider using a data step with hash table lookups. 

Kurt_Bremser
Super User

@DavidPhillips2 wrote:

For that code block doing it in proc SQL would likely save a lot of code, given 10+ lookup tables.


Doing that in SQL is akin to shooting yourself in the foot with a double-barreled magnum shotgun. It looks simple (one SQL step), but will in fact cause several sorts of the fact table.

Unless your fact table is so small that the whole operation is not time-consuming at all.

Doing that with 10 hashes in a single data step (that reads the large table sequentially just once) is the way to go, unless the lookups are too large to fit in memory all at once. Then you may have to split the operation into several steps.

(Using the examples from my previous post)

data want;
set fact_example;
if _N_ = 1
then do;
  length variable_changes $1;
  declare hash dim (dataset:'dim_example');
  dim.definekey('academic_period','id');
  dim.definedata('variable_changes');
  dim.definedone();
  length varlong $50;
  declare hash lookup (dataset:'lookup_example (rename=(varshort=variable_changes))');
  lookup.definekey('variable_changes');
  lookup.definedata('varlong');
  lookup.definedone();
  call missing(variable_changes,varlong);
end;
retain variable_changes;
rc = dim.find();
rc = lookup.find();
drop rc variable_changes;
run;
DavidPhillips2
Rhodochrosite | Level 12

Actually, it might be best to do step one as a data step selecting the data and then do the lookups in proc sql.

Patrick
Opal | Level 21

Code like below would work for your sample data.

data want;

  set fact_example;
  by id academic_period;

  if _n_=1 then 
    do;
      if 0 then set dim_example;
      dcl hash h1(dataset:'dim_example');
      h1.defineKey('academic_period', 'id');
      h1.defineData('variable_changes');
      h1.defineDone();
      drop variable_changes;

      if 0 then set lookup_example;
      dcl hash h2(dataset:'lookup_example');
      h2.defineKey('varshort');
      h2.defineData('varlong');
      h2.defineDone();
      drop varshort;
    end;

  if h1.find()=0 then h2.find(key:variable_changes);
  output;

  if last.id then call missing(of _all_);
run;

proc print data=want;
run;

Patrick_0-1594099034036.png

 

Kurt_Bremser
Super User

Why are you using SQL for the simple creation of tables? This is THE domain of data steps:

data fact_example;
input academic_period :$6. id retained;
datalines;
201010 111 1
201110 111 1
201210 111 1
201310 111 1
;

data dim_example;
input academic_period :$6. id variable_changes :$1.;
datalines;
201010 111 b
201210 111 g
;

data lookup_example;
input varshort :$1. varlong :$50.;
datalines;
b blue
g green
;

 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 1730 views
  • 1 like
  • 4 in conversation