SQL Case statement fill in blanks

DavidPhillips2 · Posted 07-06-2020 03:10 PM

I'm thinking proc SQL is the best way to go about this. I have a fact table that has Id and time. The corresponding tables that work with the fact table have less time entries than the fact table. I need to join the other tables with the fact tables on Id and time then fill in blanks that come in from the other tables based on the most recent date up to that point in time. My time dimension is a character string formatted like 201010 202110 202210 formatted as year-academic period. The reason I want to use proc SQL and case statements is because I have about 10 lookup tables like the table below and its easy to use proc SQL with lookup tables.

Please advise on how to fill in the blanks.

Test tables are:


proc sql;
	create table fact_example (
	academic_period varchar2(6),
	id int,
	retained int);

	insert into fact_example (academic_period, id, retained)
	values ('201010', 111, 1);
	insert into fact_example (academic_period, id, retained)
	values ('201110', 111, 1);
	insert into fact_example (academic_period, id, retained)
	values ('201210', 111, 1);
	insert into fact_example (academic_period, id, retained)
	values ('201310', 111, 1);

	create table dim_example (
	academic_period varchar2(6),
	id int,
	variable_changes varchar(1));

	insert into dim_example (academic_period, id, variable_changes)
	values ('201010', 111, 'b');

	insert into dim_example (academic_period, id, variable_changes)
	values ('201210', 111, 'g');


	Create table lookup_example (
	varshort varchar(1),
	varlong varchar(50));

	insert into lookup_example (varshort, varlong) values ('b', 'blue');
	insert into lookup_example (varshort, varlong) values ('g', 'green');
quit;

PGStats · Posted 07-06-2020 04:14 PM

So, in your example, the combination of the three tables would give:

academic  id  retained    varlong
_period
201010    111               1    blue
201110    111               1    
201210    111               1    green
201310    111               1

and what would the filled-in table contain?

PG

DavidPhillips2 · Posted 07-06-2020 04:21 PM

academic  id 	retained  varlong
_period
201010   111 1 blue
201110 	 111 1 blue
201210   111 1 green
201310 	 111 1 green

PGStats · Posted 07-06-2020 04:27 PM

And you need to do this exclusively with SQL? Do you eventually need to move this SQL code back to some dbms?

PG

DavidPhillips2 · Posted 07-06-2020 04:43 PM

For that code block doing it in proc SQL would likely save a lot of code, given 10+ lookup tables. After I have that block of data queried I'm switching back and forth between proc SQL and datasteps for the rest of the ETL.

PGStats · Posted 07-06-2020 05:12 PM

Do you have a separate dim table for every lookup table? What is the purpose of the dim table?

PG

DavidPhillips2 · Posted 07-06-2020 11:41 PM

There lookup tables for the most part. I have one fact table and one dim. Creating the other dims requires top level sign off at the moment.

Patrick · Posted 07-06-2020 09:42 PM

@DavidPhillips2 wrote:

For that code block doing it in proc SQL would likely save a lot of code, given 10+ lookup tables. After I have that block of data queried I'm switching back and forth between proc SQL and datasteps for the rest of the ETL.

Fact tables are normally high volume, dimension and reference tables often low volume. If using a SQL then you'll end up with a lot of sorting.

Consider using a data step with hash table lookups.

Kurt_Bremser · Posted 07-07-2020 07:41 AM

@DavidPhillips2 wrote:

For that code block doing it in proc SQL would likely save a lot of code, given 10+ lookup tables.

Doing that in SQL is akin to shooting yourself in the foot with a double-barreled magnum shotgun. It looks simple (one SQL step), but will in fact cause several sorts of the fact table.

Unless your fact table is so small that the whole operation is not time-consuming at all.

Doing that with 10 hashes in a single data step (that reads the large table sequentially just once) is the way to go, unless the lookups are too large to fit in memory all at once. Then you may have to split the operation into several steps.

(Using the examples from my previous post)

data want;
set fact_example;
if _N_ = 1
then do;
  length variable_changes $1;
  declare hash dim (dataset:'dim_example');
  dim.definekey('academic_period','id');
  dim.definedata('variable_changes');
  dim.definedone();
  length varlong $50;
  declare hash lookup (dataset:'lookup_example (rename=(varshort=variable_changes))');
  lookup.definekey('variable_changes');
  lookup.definedata('varlong');
  lookup.definedone();
  call missing(variable_changes,varlong);
end;
retain variable_changes;
rc = dim.find();
rc = lookup.find();
drop rc variable_changes;
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

DavidPhillips2 · Posted 07-06-2020 11:39 PM

Actually, it might be best to do step one as a data step selecting the data and then do the lookups in proc sql.

Patrick · Posted 07-07-2020 01:17 AM

Code like below would work for your sample data.

data want;

  set fact_example;
  by id academic_period;

  if _n_=1 then 
    do;
      if 0 then set dim_example;
      dcl hash h1(dataset:'dim_example');
      h1.defineKey('academic_period', 'id');
      h1.defineData('variable_changes');
      h1.defineDone();
      drop variable_changes;

      if 0 then set lookup_example;
      dcl hash h2(dataset:'lookup_example');
      h2.defineKey('varshort');
      h2.defineData('varlong');
      h2.defineDone();
      drop varshort;
    end;

  if h1.find()=0 then h2.find(key:variable_changes);
  output;

  if last.id then call missing(of _all_);
run;

proc print data=want;
run;

Kurt_Bremser · Posted 07-07-2020 07:27 AM

Why are you using SQL for the simple creation of tables? This is THE domain of data steps:

data fact_example;
input academic_period :$6. id retained;
datalines;
201010 111 1
201110 111 1
201210 111 1
201310 111 1
;

data dim_example;
input academic_period :$6. id variable_changes :$1.;
datalines;
201010 111 b
201210 111 g
;

data lookup_example;
input varshort :$1. varlong :$50.;
datalines;
b blue
g green
;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

SQL Case statement fill in blanks

Re: SQL Case statement fill in blanks

Re: SQL Case statement fill in blanks

Re: SQL Case statement fill in blanks

Re: SQL Case statement fill in blanks

Re: SQL Case statement fill in blanks

Re: SQL Case statement fill in blanks

Re: SQL Case statement fill in blanks

Re: SQL Case statement fill in blanks

Re: SQL Case statement fill in blanks

Re: SQL Case statement fill in blanks

Re: SQL Case statement fill in blanks

Registration is open

SAS Training: Just a Click Away