Desktop productivity for business analysts and programmers

Best way to compare summery data sets in different DB? (my starting point is 3 proc sql... w/ 'as')

Reply
Frequent Contributor
Posts: 126

Best way to compare summery data sets in different DB? (my starting point is 3 proc sql... w/ 'as')

[ Edited ]

Hello all, 
My question is possibly two part ultimately... (macro idea, or new proc ??)
I need to compare a list of tables in two databases for differences in their summery per table. I need to do this for many tables but would love to use a 'macro' and pass the table names and the metadata field names (they are different because the DB's they sit on are different. MS SQL - Oracle) and of course the number of fields per table can be different per table in a single db. However the pairs of tables checked across db's match perfectly.

What I was wondering is if a macro containing my proc sql could just be fed the table names and data fields and alias's?  If anyone had a sample they could share with me? All fields that are not sum-ed can be assumed to be char. In the end we are only looking for groups or counts that are different/missing from one db to the other. Anyway here is my code sample checks one table:

 

proc sql;
create table student_campus_fte_w as
select
 w_snapshot as snapshot,
 strm,
 acad_career,
 sum(W_ENRL_CRD_CNT) as enrl_credit_campus_count,
 sum(W_PL_IND) as PULLM_ind,
 sum(W_PL_CRD_IND) as PULLM_credit_ind,
 sum(W_PL_CRD_HOUR) as PULLM_credit_hours,
 /* ... more fields */
from &destinat..ps_W_STDNT_CMP_FTE
group by
 w_snapshot,
 strm,
 acad_career
;
quit;

proc sql;
create table student_campus_fte_c as
select
 snapshot,
 strm,
 acad_career,
 sum(enrl_credit_campus_count) as enrl_credit_campus_count,
 sum(PULLM_ind) as PULLM_ind,
 sum(PULLM_credit_ind) as PULLM_credit_ind,
 sum(PULLM_credit_hours) as PULLM_credit_hours,
 /* ... more fields */
from source.student_campus_fte
group by snapshot,
 strm,
 acad_career
 ;
quit;

proc sql;
create table student_campus_fte_d as
select
 c.*
from student_campus_fte_c c
full join student_campus_fte_w w
on c.snapshot = w.snapshot
and c.strm = w.strm
and c.acad_career = w.acad_career
where
 abs(c.enrl_credit_campus_count - w.enrl_credit_campus_count) > 0.000001 or
 abs(c.PULLM_ind - w.PULLM_ind) > 0.00001 or
 abs(c.PULLM_credit_ind - w.PULLM_credit_ind) > 0.000001 or
 abs(c.PULLM_credit_hours - w.PULLM_credit_hours) > 0.000001 or
 /* ... more fields */
;
quit;
/*proc print data=student_campus_fte_d;*/
/*run;*/

 

...of course I am open to using a different type of proc / data step, I am just looking for Ideas how I might automate this and not write the same three proc sql's with nothing but the table name, (number of fields), field names and alias's changing. Sorry I am not adding sample data in this case all obs results are summery counts other than the control group data: like census, eot, ceneot, and fall17, fall16, spring18.

 Thank you -Keith

Frequent Contributor
Posts: 126

Re: Best way to compare summery data sets in different DB? (my starting point is 3 proc sql... w/ 'a

Posted in reply to kjohnsonm

PS I am not wedded to the idea of a 'proc SQL w/ full join', so chime in if there are better ways to do this... but in this case the control groups per table are relatively small.

Super User
Posts: 22,844

Re: Best way to compare summery data sets in different DB? (my starting point is 3 proc sql... w/ 'a

Posted in reply to kjohnsonm
Can you fix your query to make it legible?
PROC Star
Posts: 1,262

Re: Best way to compare summery data sets in different DB? (my starting point is 3 proc sql... w/ 'a

My feeling is this should be fairly doable:


1. Create a control table with the character (group by) variables, and the quantitative (sum) variables.

 

2. Create a macro that does pretty much what you're showing, except use PROC COMPARE to do your comparison.

 

3. Have a DATA _NULL_ step that reads your control data, and sets up three strings that look like
"w_snapshot as snapshot, strm, acad_career,"
"sum(W_ENRL_CRD_CNT) as enrl_credit_campus_count, sum(W_PL_IND) as PULLM_ind, sum(W_PL_CRD_IND) as PULLM_credit_ind, sum(W_PL_CRD_HOUR) as PULLM_credit_hours,"
"w_snapshot, strm, acad_career"

 

4. Set up a macro call that includes those three strings, along with the two incoming table names, and do a "CALL EXECUTE" with that macro call.

 

I suggest you start small; i) get the basic code working, no macro, no "call execute". ii) set up a macro, and get one example working. iii) expand to the call execute.

 

Tom

Ask a Question
Discussion stats
  • 3 replies
  • 190 views
  • 2 likes
  • 3 in conversation