Help using Base SAS procedures

How to join 3 tables with different by group in fewer steps?

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 11
Accepted Solution

How to join 3 tables with different by group in fewer steps?

Hi All,

I need you suggestions to make my logic little efficient/faster since I am dealing with HUGE data.

I have one big table on which I am calculating average of a column twice (see below avgone and avgtwo dataset) by two different by groups.

Now I would like to add these two columns avg1 and avg2 back with big table (left join/master table) which has different by group to join with.

I know I can do it in several proc sql steps but can I do it in one step of proc sql/data step or preferably hash programming which is faster than several proc sqls?

Thanks in advance for your help.

/*master table*/

data big;

input A1   A2  A3  A4  Rank  revenue ;

cards;

1      1      1     1     1            100

1      1      1     1     2            120    

1      1      2     1     3            90

1      1      2     1     4            12

1      1      3     2     1            100

1      1      3     2     2            107

1      1      3     3     3            198

1      1      3     4     4            87  

1      1      4     4     5            18

1      1      4     5     6            54

1      1      4     5     7            178

;

run;

/* average with different by group */

proc sql;

create table avgone as

select a1, a2, a3 , avg(revenue) as avg1

from big

group by a1,a2,a3;

quit;

/* average with different by group */

proc sql;

create table avgtwo as

select a1, a2, a4 , avg(revenue) as avg2

from big

group by a1,a2,a4;

quit;

/* Now join avgone and avgtwo back with big table */


Accepted Solutions
Solution
‎09-30-2014 04:19 PM
Respected Advisor
Posts: 4,745

Re: How to join 3 tables with different by group in fewer steps?

SAS SQL remerges summaries automatically when the select list contains columns not involved in the grouping. Try it!

The inclusion of summaries in the table sounds like poor modelling because it is redundant information and  modifying your data will likely make your summaries inconsistent.

PG

PG

View solution in original post


All Replies
Super User
Posts: 5,308

Re: How to join 3 tables with different by group in fewer steps?

You could avoid joining back the result to the detail data by having all column in the select clause. Don't know how efficient that will be in your case. And I get a little tired of always pointing out hash programming to solve performance issues. It seem often that you are reinventing the week.  There are standard procedures and SQL for a reason. Do you really need to rejoin aggregate information to detail rows? What is the final goal/application?  To me it sounds like bad modelling.

Data never sleeps
Occasional Contributor
Posts: 11

Re: How to join 3 tables with different by group in fewer steps?

I cannot have all column names in select statement since I am grouping it with smaller subset of columns... if I do that it will not summarize my data.

Are you saying hash programming is not going to solve performance issue?

My final goal is to add two new variables to the original big table and those two new variables are the average of revenue column calculated by different by group.

not sure why it is bad modeling..

Solution
‎09-30-2014 04:19 PM
Respected Advisor
Posts: 4,745

Re: How to join 3 tables with different by group in fewer steps?

SAS SQL remerges summaries automatically when the select list contains columns not involved in the grouping. Try it!

The inclusion of summaries in the table sounds like poor modelling because it is redundant information and  modifying your data will likely make your summaries inconsistent.

PG

PG
Occasional Contributor
Posts: 11

Re: How to join 3 tables with different by group in fewer steps?

Got it..Thanks for your feedback..

Super Contributor
Posts: 418

Re: How to join 3 tables with different by group in fewer steps?

Linus sorry to jump into this thread but I am also curious to your and PGstats responses.

I do know that sql can have different select clauses over group by clauses, but since Anna_nag has to do it twice how would this work in one sql statement? Or are you saying to simply do it in two sql statements and then merge them (which becomes your big table) and then live with the performance issues?

Ex: I know you can do this, and this is a "solution" to the original posters question.

proc sql;

create table avgtwo as

select a1, a2,a3, a4 , avg(revenue) as avg2

from big

group by a1,a2,a4;

quit;

Which would calculate the average over a1,a2, and a4 across the assocaited rows (including A3). however how would you then calculate the average of a1,a2,a3 without doing another create table and merge?  I realize the merge is more efficient since the tables would share all variables and it would only be one merge. I'm just curious if this is what you were implying. So the final code would look like

proc sql;

create table avgone as

select a1, a2,a3, a4 , avg(revenue) as avg1

from big

group by a1,a2,a3;

quit;

proc sql;

create table avgtwo as

select a1, a2,a3, a4 , avg(revenue) as avg2

from big

group by a1,a2,a4;

quit;

proc sort data=avgone;

by a1 a2 a3 a4;

run;

proc sort data=avgtwo;

by a1 a2 a3 a4;

run;

data answer;

merge avgone avgtwo;

by a1 a2 a3 a4;

run;

Respected Advisor
Posts: 4,745

Re: How to join 3 tables with different by group in fewer steps?

You could do:

proc sql;

create table bigger as

select *, mean(revenue) as rev124

from (select *, mean(revenue) as rev123 from big group by a1, a2, a3)

group by a1, a2, a4;

quit;

PG

PG
Super User
Posts: 5,308

Re: How to join 3 tables with different by group in fewer steps?

No problem Anotherdream.

No, not merge. First round calculate the first mean, sr´tore with detail data.

Second round, calculate the the second means from round one result table.

proc sql;

create table avgone as

select *, avg(revenue) as avg1

from big

group by a1,a2,a3;

quit;

/* average with different by group */

proc sql;

create table avgtwo as

select * , avg(revenue) as avg2

from avgone

group by a1,a2,a4;

quit;

Data never sleeps
Super Contributor
Posts: 418

Re: How to join 3 tables with different by group in fewer steps?

Ah that makes total sense. SOrry I did not conceptually get that. I am so not used to the select being able to be different from the group I never thought of a sub stepping solution like you and PG have.

Sorry for the confusion, that's actually a very cool trick.

Occasional Contributor
Posts: 11

Re: How to join 3 tables with different by group in fewer steps?

Thanks everyone.. Got some really good feedback from you all.

Super User
Posts: 5,235

Re: How to join 3 tables with different by group in fewer steps?

This may prove irrelevant to your final solution, but notice ...

You are already processing your huge data set twice, to get the average revenue for two different groupings.  A single PROC SUMMARY would produce both levels of summarization with a single pass through the data.  The value of switching to PROC SUMMARY depends on how large your summarized tables are, relative to the original huge table.  The best methods of joining the averages may also depend on the size of the summarized tables.  How many records are you seeing in AVGONE and AVGTWO?  (And how many in the original table BIG?)

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 10 replies
  • 277 views
  • 2 likes
  • 5 in conversation