BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
eric23263
Fluorite | Level 6

15 minutes for this proc sql  with 2 tables !

Another procedure?

 

proc sql;

create table pour_TDB.TDB25 as SELECT  TDB_bul.Code_diplome, TDB_bul.AnSuiv,

sum(eff_app_total17) as eff_app_total17, sum(eff_app_bul_16) as eff_app_bul_16,      sum(eff_app_bul_15) as eff_app_bul_15, sum(eff_app_bul_14) as eff_app_bul_14,

.............

FROM pour_TDB.TDB_bul LEFT JOIN pour_TDB.Quater_MF ON TDB_bul.AnSuiv = Quater_MF.Annee_suivie AND TDB_bul.Code_diplome = Quater_MF.Code_diplome

GROUP BY TDB_bul.Code_diplome, TDB_bul.AnSuiv;

run;quit;

 

NOTE: Table POUR_TDB.TDB25 created, with 2421 rows and 67 columns.

135  run;quit;

NOTE: PROC SQL statements are executed immediately; The RUN statement has no effect.

NOTE: PROCEDURE SQL used (Total process time):

      real time           15:30.63

      cpu time            15:30.43

     

1 ACCEPTED SOLUTION

Accepted Solutions
eric23263
Fluorite | Level 6

Thank you for everyone.
I did not think about it, join and then proc summary

View solution in original post

8 REPLIES 8
RW9
Diamond | Level 26 RW9
Diamond | Level 26

And how did your previous post which is marked as complete successfully solve this?

https://communities.sas.com/t5/SAS-Procedures/slow-on-Proc-Sql-summation/m-p/451567#M69759

 

Appears to be the same question? 

Reeza
Super User

I suspect what's in the ... is the issue.

ballardw
Super User

I would be tempted to sum the variables first (assuming they all come from the a single data set) and then join, or if the variables that need summary are mixed in the two sets do the summary first and then join the summary results.

proc summary data=settosum nway;
   class Code_diplome AnSuiv;
   var eff: ;
   output out=summary(drop=_type_ _freq_) sum= ;
run;

Since you didn't list all of the variables summed I followed the example actually shown in the code where all of the variables that were summed started with eff.

 

If Code_diplome AnSuiv are both character you could use: Var _numeric_ to sum all of the numeric variables without explicitly listing them.

The example code creates the sum variable with the same name as the input variable. If you request multiple statistics you need to provide some names for each statistic or use the option /autoname to create names with the statistic as a suffix to source name.

eric23263
Fluorite | Level 6

It's not the same question, it's with 2 tables

I can't do that

proc summary   data=TDB_bul LEFT JOIN pour_TDB.Quater_MF ON TDB_bul.AnSuiv = Quater_MF.Annee_suivie AND TDB_bul.Code_diplome = Quater_MF.Code_diplome ;
class Code_diplome AnSuiv ;/*2 columns from TDB_bul*/
var eff_app_total17 Masc_2015 ; /*67 columns from TDB_bul and Quater_MF*/
output out=pour_TDB.TDB25 sum= ;
run;

 

ballardw
Super User

@eric23263 wrote:

It's not the same question, it's with 2 tables

I can't do that

proc summary   data=TDB_bul LEFT JOIN pour_TDB.Quater_MF ON TDB_bul.AnSuiv = Quater_MF.Annee_suivie AND TDB_bul.Code_diplome = Quater_MF.Code_diplome ;
class Code_diplome AnSuiv ;/*2 columns from TDB_bul*/
var eff_app_total17 Masc_2015 ; /*67 columns from TDB_bul and Quater_MF*/
output out=pour_TDB.TDB25 sum= ;
run;

 


No you can't proc summary works on one data set.

Without example data it is very hard to make suggestions. However for summing things creating the summaries from each of the input data sets separately would reduce the number of records being joined. Also Proc summary (or means ) allows use of variable lists so the code can be much shorter if you don't have to list 300 variables by name with the sum(var) as var structure required by SQL.

 

Since your sql code did not include sum(table.variable) I had to assume that each of the variables being summed only  comes from one data set. So that set should be quite amenable to Proc Summary. If the variables are mixed between the two sets the summarize each. Something like

proc summary data=settosum nway;
   class Code_diplome AnSuiv;
   var eff: ;
   output out=summary(drop=_type_ _freq_) sum= ;
run;

proc summary data=settosum2 nway;
   class Code_diplome AnSuiv;
   var eff: ;
   output out=summary2(drop=_type_ _freq_) sum= ;
run;

proc sql;
   create table want as
   select <stuff>
   from summary as a
        left join
        summary2 as b
        on <criteria>
   ;
quit;

The error messages from your Proc Summary should tell that you were not using the tool (proc summary) correctly.

 

I realize that many people have a background in SQL and try to do everything there because of familiarity but the SAS environment as a LOT of tools and some of them are much more efficient to execute, or simpler to code (the proc summary example here). These other tools have there own rules and it helps to learn them.

 

I do not try to force data steps to do a Cartesian join on multiple variables because SQL is almost always the best tool for that. Similarly summarizing data with another procedure before merging is a common approach.

 

Also note that with a very simple change to the code from proc summary I can the sums of those variables for 1)all the records, 2) all the values of Code_diplome, 3) all the values of AnSuiv as well as 4) only the combinations of Code_diplome and AnSuiv.

How simple: remove the NWAY option (though with that you want to keep the _type_ variable.

Add std= mean= max= min= /autoname after the sum= in the output statement of proc summary and I now have those statistics and the variables will be named with the original variable name with _sum _std _mean _max and _min appended. Without having to type out 65 (or 300 or what have you) std(var) as var_std, mean(var) as var_mean etc statements from SQL.

 

Even if Proc summary ran a little slower the time I saved by using variable lists and autoname might make the solution faster than SQL.

LinusH
Tourmaline | Level 20
You could use the same technique, first join (as view or in a temporary table) and then PROC SUMMARY.
Data never sleeps
eric23263
Fluorite | Level 6

thank you, I'll try Monday

What I did today :

NOTE: Table POUR_TDB.TDB_25_1 created, with 2109 rows and 46 columns.
NOTE: Table POUR_TDB.TDB_25_2 created, with 2109 rows and 20 columns.
NOTE: Table POUR_TDB.TDB_25_3 created, with 2109 rows and 21 columns.
NOTE: Table POUR_TDB.TDB_25_4 created, with 2109 rows and 35 columns.

  each proc sql in less than   0.10 secondes

I think the INNER JOIN between the 4 tables is fast

 

eric23263
Fluorite | Level 6

Thank you for everyone.
I did not think about it, join and then proc summary

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 979 views
  • 0 likes
  • 5 in conversation