Help using Base SAS procedures

slow on Proc Sql with 2 tables- summation

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 8
Accepted Solution

slow on Proc Sql with 2 tables- summation

15 minutes for this proc sql  with 2 tables !

Another procedure?

 

proc sql;

create table pour_TDB.TDB25 as SELECT  TDB_bul.Code_diplome, TDB_bul.AnSuiv,

sum(eff_app_total17) as eff_app_total17, sum(eff_app_bul_16) as eff_app_bul_16,      sum(eff_app_bul_15) as eff_app_bul_15, sum(eff_app_bul_14) as eff_app_bul_14,

.............

FROM pour_TDB.TDB_bul LEFT JOIN pour_TDB.Quater_MF ON TDB_bul.AnSuiv = Quater_MF.Annee_suivie AND TDB_bul.Code_diplome = Quater_MF.Code_diplome

GROUP BY TDB_bul.Code_diplome, TDB_bul.AnSuiv;

run;quit;

 

NOTE: Table POUR_TDB.TDB25 created, with 2421 rows and 67 columns.

135  run;quit;

NOTE: PROC SQL statements are executed immediately; The RUN statement has no effect.

NOTE: PROCEDURE SQL used (Total process time):

      real time           15:30.63

      cpu time            15:30.43

     


Accepted Solutions
Solution
‎04-09-2018 05:40 AM
Occasional Contributor
Posts: 8

Re: slow on Proc Sql with 2 tables- summation

Posted in reply to eric23263

Thank you for everyone.
I did not think about it, join and then proc summary

View solution in original post


All Replies
Super User
Super User
Posts: 9,599

Re: slow on Proc Sql with 2 tables- summation

Posted in reply to eric23263

And how did your previous post which is marked as complete successfully solve this?

https://communities.sas.com/t5/SAS-Procedures/slow-on-Proc-Sql-summation/m-p/451567#M69759

 

Appears to be the same question? 

Super User
Posts: 23,761

Re: slow on Proc Sql with 2 tables- summation

Posted in reply to eric23263

I suspect what's in the ... is the issue.

Super User
Posts: 13,574

Re: slow on Proc Sql with 2 tables- summation

Posted in reply to eric23263

I would be tempted to sum the variables first (assuming they all come from the a single data set) and then join, or if the variables that need summary are mixed in the two sets do the summary first and then join the summary results.

proc summary data=settosum nway;
   class Code_diplome AnSuiv;
   var eff: ;
   output out=summary(drop=_type_ _freq_) sum= ;
run;

Since you didn't list all of the variables summed I followed the example actually shown in the code where all of the variables that were summed started with eff.

 

If Code_diplome AnSuiv are both character you could use: Var _numeric_ to sum all of the numeric variables without explicitly listing them.

The example code creates the sum variable with the same name as the input variable. If you request multiple statistics you need to provide some names for each statistic or use the option /autoname to create names with the statistic as a suffix to source name.

Occasional Contributor
Posts: 8

Re: slow on Proc Sql with 2 tables- summation

It's not the same question, it's with 2 tables

I can't do that

proc summary   data=TDB_bul LEFT JOIN pour_TDB.Quater_MF ON TDB_bul.AnSuiv = Quater_MF.Annee_suivie AND TDB_bul.Code_diplome = Quater_MF.Code_diplome ;
class Code_diplome AnSuiv ;/*2 columns from TDB_bul*/
var eff_app_total17 Masc_2015 ; /*67 columns from TDB_bul and Quater_MF*/
output out=pour_TDB.TDB25 sum= ;
run;

 

Super User
Posts: 13,574

Re: slow on Proc Sql with 2 tables- summation

Posted in reply to eric23263

@eric23263 wrote:

It's not the same question, it's with 2 tables

I can't do that

proc summary   data=TDB_bul LEFT JOIN pour_TDB.Quater_MF ON TDB_bul.AnSuiv = Quater_MF.Annee_suivie AND TDB_bul.Code_diplome = Quater_MF.Code_diplome ;
class Code_diplome AnSuiv ;/*2 columns from TDB_bul*/
var eff_app_total17 Masc_2015 ; /*67 columns from TDB_bul and Quater_MF*/
output out=pour_TDB.TDB25 sum= ;
run;

 


No you can't proc summary works on one data set.

Without example data it is very hard to make suggestions. However for summing things creating the summaries from each of the input data sets separately would reduce the number of records being joined. Also Proc summary (or means ) allows use of variable lists so the code can be much shorter if you don't have to list 300 variables by name with the sum(var) as var structure required by SQL.

 

Since your sql code did not include sum(table.variable) I had to assume that each of the variables being summed only  comes from one data set. So that set should be quite amenable to Proc Summary. If the variables are mixed between the two sets the summarize each. Something like

proc summary data=settosum nway;
   class Code_diplome AnSuiv;
   var eff: ;
   output out=summary(drop=_type_ _freq_) sum= ;
run;

proc summary data=settosum2 nway;
   class Code_diplome AnSuiv;
   var eff: ;
   output out=summary2(drop=_type_ _freq_) sum= ;
run;

proc sql;
   create table want as
   select <stuff>
   from summary as a
        left join
        summary2 as b
        on <criteria>
   ;
quit;

The error messages from your Proc Summary should tell that you were not using the tool (proc summary) correctly.

 

I realize that many people have a background in SQL and try to do everything there because of familiarity but the SAS environment as a LOT of tools and some of them are much more efficient to execute, or simpler to code (the proc summary example here). These other tools have there own rules and it helps to learn them.

 

I do not try to force data steps to do a Cartesian join on multiple variables because SQL is almost always the best tool for that. Similarly summarizing data with another procedure before merging is a common approach.

 

Also note that with a very simple change to the code from proc summary I can the sums of those variables for 1)all the records, 2) all the values of Code_diplome, 3) all the values of AnSuiv as well as 4) only the combinations of Code_diplome and AnSuiv.

How simple: remove the NWAY option (though with that you want to keep the _type_ variable.

Add std= mean= max= min= /autoname after the sum= in the output statement of proc summary and I now have those statistics and the variables will be named with the original variable name with _sum _std _mean _max and _min appended. Without having to type out 65 (or 300 or what have you) std(var) as var_std, mean(var) as var_mean etc statements from SQL.

 

Even if Proc summary ran a little slower the time I saved by using variable lists and autoname might make the solution faster than SQL.

Super User
Posts: 5,884

Re: slow on Proc Sql with 2 tables- summation

Posted in reply to eric23263
You could use the same technique, first join (as view or in a temporary table) and then PROC SUMMARY.
Data never sleeps
Occasional Contributor
Posts: 8

Re: slow on Proc Sql with 2 tables- summation

thank you, I'll try Monday

What I did today :

NOTE: Table POUR_TDB.TDB_25_1 created, with 2109 rows and 46 columns.
NOTE: Table POUR_TDB.TDB_25_2 created, with 2109 rows and 20 columns.
NOTE: Table POUR_TDB.TDB_25_3 created, with 2109 rows and 21 columns.
NOTE: Table POUR_TDB.TDB_25_4 created, with 2109 rows and 35 columns.

  each proc sql in less than   0.10 secondes

I think the INNER JOIN between the 4 tables is fast

 

Solution
‎04-09-2018 05:40 AM
Occasional Contributor
Posts: 8

Re: slow on Proc Sql with 2 tables- summation

Posted in reply to eric23263

Thank you for everyone.
I did not think about it, join and then proc summary

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 231 views
  • 0 likes
  • 5 in conversation