BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
JeffM1968
Fluorite | Level 6

Howdy,

 

I work at a data company and pull data all day :-).  Some of our tables are large and the initial data pull can take a while.  Is there a more efficient way than PROC SQL?

 

Here is a typical data pull.

 

Jeff

 

PROC SQL;
   CREATE TABLE WORK.TECH_01 AS 
   SELECT t1.industrysegment AS ProdLvl_1, 
          t1.categorygroup AS ProdLvl_2, 
          t1.category AS ProdLvl_3, 
          t1.subcategory AS ProdLvl_4, 
          t1.itemnumberunsuppressed AS iItem_uns, 
          t1.outletfamily, 
          t1.outlet_storeid, 
          t1.ppmonth, 
          t1.ppweek, 
          /* Dollars */
            (SUM(t1.totalvalue)) AS Dollars, 
          /* Units */
            (SUM(t1.unitssold)) AS Units
      FROM TECSWP2.vw_tecmlwk_fact_nc_uns t1
      WHERE t1.subcategory IN 
           (
           60638,
           60640
           ) AND t1.totalvalue >= 1 AND t1.unitssold >= 1 AND t1.ppmonth BETWEEN &Month-23 and &Month
      GROUP BY t1.industrysegment,
               t1.categorygroup,
               t1.category,
               t1.subcategory,
               t1.itemnumberunsuppressed,
               t1.outletfamily,
               t1.outlet_storeid,
               t1.ppmonth,
               t1.ppweek;
QUIT;
1 ACCEPTED SOLUTION

Accepted Solutions
s_lassen
Meteorite | Level 14

It is possible to redo your SQL as a data step. It may even be faster, if your database server is slow at calculating sums, and the connection between the server and your SAS machine is fast:

%let byvars=industrysegment categorygroup category subcategory itemnumberunsuppressed outletfamily outlet_storeid ppmonth ppweek;
data WORK.TECH_01;
  set  TECSWP2.vw_tecmlwk_fact_nc_uns(keep=&byvars totalvalue unitssold);
  WHERE subcategory IN (60638,60640)
    AND totalvalue >= 1 AND unitssold >= 1 AND ppmonth BETWEEN &Month-23 and &Month;
  by &byvars;
  if first.%scan(&byvars,-1) then do;
   Dollars=totalvalue;
   Units=unitssold;
   end;
  else do;
   Dollars+totalvalue;
   Units+unitssold;
   end;
  if last.%scan(&byvars,-1);
drop totalvalue unitssold; run;

But the data server still has to sort the data before delivering it to SAS, so there is no guarantee of a performance gain.

View solution in original post

4 REPLIES 4
Reeza
Super User

This wouldn't be a data step, at best it could be a proc means but you SQL would be as efficient especially if its pulling from an RDBMS system. 

I'm curious as to how you're storing month such that this works....

&Month-23

 

 

proc means data=tecswp2.vw_tecmlwk_fact_nc_uns noprint NWAY;
where subcategory in (60638, 60640) and totalvalue>=1 and unitssold>=1 and ppmonth between  &Month-23 and &Month;

class [list your group by variables here];

var totalvalue unitssold;
output out=tech_01 sum(totalValue) = Dollars sum(unitssold) = Units;
run;

 

You can rename the variables in the same step on the OUT statement as well:

output out=tech_01 (rename=(industrySegment = Prodlvl_1 categoryGroup=ProdLvl2) 
    sum(totalValue) = Dollars sum(unitssold) = Units;

That should be enough to get you going if you want to test out the performance difference.

 

 

SASKiwi
PROC Star

From the look of your query you are reading from an external database. There is probably no point in converting this to a DATA step as behind the scenes SAS has to convert it back to SQL for external databases anyway. You could try using SQL passthru instead of the implicit SQL you are using.

 

For large data extracts you will be limited by the network bandwidth between the database and SAS anyway. One way to prove exactly how long the database takes is to run your query without transmitting lots of data is to turn it into a simple row count:

PROC SQL;
   select count(*) from 
   (SELECT t1.industrysegment AS ProdLvl_1, 
          t1.categorygroup AS ProdLvl_2, 
          t1.category AS ProdLvl_3, 
          t1.subcategory AS ProdLvl_4, 
          t1.itemnumberunsuppressed AS iItem_uns, 
          t1.outletfamily, 
          t1.outlet_storeid, 
          t1.ppmonth, 
          t1.ppweek, 
          /* Dollars */
            (SUM(t1.totalvalue)) AS Dollars, 
          /* Units */
            (SUM(t1.unitssold)) AS Units
      FROM TECSWP2.vw_tecmlwk_fact_nc_uns t1
      WHERE t1.subcategory IN 
           (
           60638,
           60640
           ) AND t1.totalvalue >= 1 AND t1.unitssold >= 1 AND t1.ppmonth BETWEEN &Month-23 and &Month
      GROUP BY t1.industrysegment,
               t1.categorygroup,
               t1.category,
               t1.subcategory,
               t1.itemnumberunsuppressed,
               t1.outletfamily,
               t1.outlet_storeid,
               t1.ppmonth,
               t1.ppweek
   );
QUIT;

 

s_lassen
Meteorite | Level 14

It is possible to redo your SQL as a data step. It may even be faster, if your database server is slow at calculating sums, and the connection between the server and your SAS machine is fast:

%let byvars=industrysegment categorygroup category subcategory itemnumberunsuppressed outletfamily outlet_storeid ppmonth ppweek;
data WORK.TECH_01;
  set  TECSWP2.vw_tecmlwk_fact_nc_uns(keep=&byvars totalvalue unitssold);
  WHERE subcategory IN (60638,60640)
    AND totalvalue >= 1 AND unitssold >= 1 AND ppmonth BETWEEN &Month-23 and &Month;
  by &byvars;
  if first.%scan(&byvars,-1) then do;
   Dollars=totalvalue;
   Units=unitssold;
   end;
  else do;
   Dollars+totalvalue;
   Units+unitssold;
   end;
  if last.%scan(&byvars,-1);
drop totalvalue unitssold; run;

But the data server still has to sort the data before delivering it to SAS, so there is no guarantee of a performance gain.

JeffM1968
Fluorite | Level 6
It technically works but was slower than the PROC SQL so i'll stick with that.\

Thanks anyway!!!

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 593 views
  • 2 likes
  • 4 in conversation