Request help in effectively merging 24 tables?

Reply
Contributor
Posts: 56

Request help in effectively merging 24 tables?


Hello,

I have 24 tables that needs to be combined to one. Each table has roughly about 14 dimensions and 3 different measures and has roughly 5 to 10 million records.  For example- the dataset looks like

Dimesion1 dimension2--DimensionN measure1--measureN

I woud need to merge by 3 by variables, let's say Dimension2, Dimension3 and Dimension4

The point is for the same level of dimensions the data has various measures lies in different related tables. For instance, measure_california is an a different table as opposed to measure_texas. Merging all 24 tables would eventually prove a purpose for further calculations and analysis.

I would appreciate an effective alternative solution plz not having to sort or a tedious merge/sql joins

Thanks,

Andy

Contributor
Posts: 45

Re: Request help in effectively merging 24 tables?


Are the same 3 'merge' variables present in each table? If so I see no reason why a simple SQL join wouldnt suffice:

/* establish Macro lookup list */

%let table1 = your_first_table;

%let table2 = your_2nd_table;

...

%macro joiner;

proc sql;

create table want as select * from

&table1. as t1

inner join

%do i = 2 %to 23;

&&table&i.. as d&i.

on d%eval(&i.-1).Dimension2 = d&i..Dimension2

AND

d%eval(&i.-1).Dimension3 = d&i..Dimension3

AND

d%eval(&i.-1).Dimension4 = d&i..Dimension4

inner join

%end;

&table24. as d42

on d41.Dimension2 = d42.Dimension2 

AND

d41.Dimension3 = d42.Dimension3

AND

d42.Dimension4 = d42.Dimension4

;

quit;

%mend;

%joiner

Contributor
Posts: 56

Re: Request help in effectively merging 24 tables?

Hi Murray,

Thanks very much. I will try your solution. If you don't mind can you please briefly explain me the logic and indexing below in your code as I am very new to technical stuff. Sorry for the bother.

%macro joiner;

proc sql;

create table want as select * from

&table1. as t1

inner join

%do i = 2 %to 23;

&&table&i.. as d&i.

on d%eval(&i.-1).Dimension2 = d&i..Dimension2

AND

d%eval(&i.-1).Dimension3 = d&i..Dimension3

AND

d%eval(&i.-1).Dimension4 = d&i..Dimension4

inner join

%end;

&table24. as d42

on d41.Dimension2 = d42.Dimension2 

AND

d41.Dimension3 = d42.Dimension3

AND

d42.Dimension4/*shouldn't this be d41?*/ = d42.Dimension4

;

quit;

%mend;

Respected Advisor
Posts: 3,124

Re: Request help in effectively merging 24 tables?

Maybe Off-Topic.  But even after successful merging, a table with 10 million records and 400 + variables is not really pragmatically ready for any real-time query. Last time we faced the same scenario, where we started using OLAP cube on SAS Enterprise Business Intelligence platform.

Haikuo

Contributor
Posts: 56

Re: Request help in effectively merging 24 tables?

Agreed and makes sense. Basically what i am trying to accomplish is get the measures from across  all different tables into one and then do certain aggregation later. At this point, i have got those measures in such 24 different tables. It's interesting you pointed out OLAP cube, and I wish there was such well defined hierarchies to even form a know how to build such cubes. The data aint the cleanest or compehensive one either however we are just going with a flow being a very amateur team.

Thanks for the mention,

Andy

Respected Advisor
Posts: 3,124

Re: Request help in effectively merging 24 tables?

If your institution already has SAS EBI and licensed OLAP cube or your pocket is deep enough, then according to your description it would be no brainer to use SAS OLAP cube. Or if you happen to have SQL server, then another possibility would be SSAS Olap cube, which is made free by Microsoft if you using SQL server database.

Haikuo

Contributor
Posts: 56

Re: Request help in effectively merging 24 tables?

Could I get help with a datastep solution for the same?

Thanks,

Andy

Super Contributor
Posts: 644

Re: Request help in effectively merging 24 tables?

This sounds like a Big Data opportunity to me...

However, to merge the datasets in a data step they first have to be ordered by the dimensions you want to join on.  This is one occasion where it makes sense to put all the code on one line and repeat 24 times - quicker than writing a macro:

Proc Sort data = datatable1 ; By dimension1 dimension2 dimension3 ; Run ;

Proc Sort data = datatable2 ; By dimension1 dimension2 dimension3 ; Run ;

...

Proc Sort data = datatable24 ; By dimension1 dimension2 dimension3 ; Run ;

Next you have to decide what to do if some data is missing in some tables for some of the key dimensions.  The following code assumes you will allow joins where data from one or more tables is missing (ie, not an inner join)

Data Jointables ;

     Merge datatable1

               datatable2

               ...

               datatable24

               ;

     By     dimension1

               dimension2

               dimension3

               ;

     /*     Insert other code here if required */

Run ;

If there are any columns in the tables you do not want to use in analysis, drop them during the merge

               ...

               datatable5 (drop = analysis3 analysis 7)

               ...

Richard

Contributor
Posts: 56

Re: Request help in effectively merging 24 tables?

Hi Richard,

Thank you for your response and for your time. Well, I do understand your logic of traditional merge completely. However, that would involve a lot of typing right? for example, 24 proc sort statements and and the merge in the following datastep.

Not that i mean to be lazy to type, but I would like a nice optimized solution that would be easier to handle without having to type so much. (smiles)

Thanks,

Andy

Super Contributor
Posts: 644

Re: Request help in effectively merging 24 tables?

@Andygray

I'm glad you are familiar with the datastep merge process because I did worry that I hadn't mentioned that if the 3 by variables were not unique keys to all (or all but one) of the tables you would need to do think through whether a SQL cartesian product was preferable to the different way a SAS datastep merge handles repeated values. 

It seemed to me that you were looking for a straightforward way to handle a multiple merge of many tables.  Now I can quite see that if the number of tables or their names varies from instance to instance then a macro to get the names and insert them in the merge makes sense, and I would have suggested that.  It even makes sense if you have many many more than 24 tables to process.

But it makes no sense at all if these conditions are not met.  Good ol' cut and paste is all you need here.  Copy the sort statement 24 times and just overtype in the table names in each one.  Hold down the Alt key and select the table names from out of the sort statements and you have a list to paste into the merge statement.  Job done.

Sometimes the elegant solution is just gilding the lily.

Richard

Super User
Posts: 17,963

Re: Request help in effectively merging 24 tables?

Generally when I go about reversing a data dimension structure Proc Format can come in handy, not for the measures but for the dimension tables.

Usually with such a structure you should have a fact table lying around as well....or maybe that's what you're trying to create?

Super Contributor
Posts: 276

Re: Request help in effectively merging 24 tables?

Hi Andygray,

Macros is available for serve you..

Your can create a sample marco for Sort numer of datasets.

Like bellow..

%macro Sort(DSN=,Var1=,var2=);

Proc Sort data=&DSN;

By &Var1 &var2;

run;

%mend;

%sort (DSN=datatable1 ,Var1=dimension1 ,var2=dimension2);

Thanks,

Sanjeev.K

Super Contributor
Posts: 276

Re: Request help in effectively merging 24 tables?

Still you are not at all ready for invoke the macro in many times,

Here is the automated code.

Before going to the code ,i made some assumptions.

1.All your datasets in the same Library.

2.By variables are consistent for every proc sort procedure.

Here i am merging all the datasets in the SASUSER lib.

Proc Sql ;

create table tt as

Select * from dictionary.tables

where libname="SASUSER" and memtype="DATA";

quit;

Proc Sql;

Select Count(memname) into : Cnt separated by ' '  from tt;

Select memname into : names separated by ' ' from tt;

Select memname into : name1 - : name&cnt.   from tt;

quit;

options user=SASUSER;

%macro Process;

%do i = 1 %to &cnt.;

Proc Sort data=&&name&i ;

By Var1 var2;/*assuming that by variables are consistent*/

run;

%end;

Data Final;

Merge &names;

by var1 var2;/*assuming that by variables are consistent*/

run;

%mend;

%process;

Thanks,

Sanjeev.K

Contributor
Posts: 56

Re: Request help in effectively merging 24 tables?

Hi,

That one looks promising, lemme try and come back to you with a response.

Thanks for the effort,

Andy

Respected Advisor
Posts: 3,908

Re: Request help in effectively merging 24 tables?

From what I understand you want to create an analytical data mart.

What I would suggest is to first design and load a dimensional model for storing and maintaining your data and then from this model create the analytical data mart(s) you need.

You wrote "For instance, measure_california is an a different table as opposed to measure_texas"

In my understanding you have here 2 tables with the same structure but different variable names. If so then for such data you would need to align variable names and attributes and then concatenate (and not merge) the data. So what you should end with is something like "measure", "state".

As wrote "a table with 10 million records and 400 + variables is not really pragmatically ready for any real-time query" so some data modelling might be worth the effort.

Also consider 's suggestion to use formats for the dimensions. The Fact table would only contain the key (let's say "zip-code") and you would have a dimension table "Geography" with {zip-code, city, state). When creating the analytical data mart you could only add the "zip-code" variable and create formats from the dimension table for the rest. Such an approach would strongly reduce the number of variables in your analytical data mart so volumes are reduced speeding up all your queries.

As for OLAP: It sounds very much like a case for OLAP. IF there is an option for some investment then I would go for Visual Analytics (VA) which I believe comes at a very reasonable cost for a small installation SAS Visual Analytics | SAS.

Ask a Question
Discussion stats
  • 15 replies
  • 507 views
  • 0 likes
  • 8 in conversation