BookmarkSubscribeRSS Feed
Tacos_Tacos_Tacos
Calcite | Level 5

So here's the mission:

I need to create a series of data sets by merging tables in a proc sql procedure.  These tables are to be created by quarter, from Q1 2000 through Q1 2015, which means that in the end we'll have 61 separate data sets.  Doing this manually is far too time consuming, so I'd like to automate it.  I think that this ought to be possible with a combination of loops and macros.  I am competent enough with the former.  I am utterly worthless with regards to the latter, and this seems like a rough but doable learning opportunity.

My thinking is that I'm going to need two separate macro programs: one to repeat the proc sequel for each data set (while switching the date values when determining which data to grab), and another to generate macro variables to populate said date values.

First, the procedure to be repeated:

proc sql;
connect to oracle (user=&user. pass=&pass.);
  
create table sastemp.["table"||&vsn]
   as select *
   from connection to oracle (
  select  a.variables,
   b.variables
           
   from statictable1 b
   left join statictable2 a on b.variable = a.variable
     
   where b.datevariable between [&&startdt&i] and [&&enddt&i]
 
) order by variable ;
        disconnect from oracle ;
quit ;

I have run this successfully in SAS and have created the first dataset.  You'll note that three things need to change with each iteration of the proposed loop: [table||&vsn], which should be iteratively numbered with each pass; and [&startdt] and [&enddt], which should represent the first and last days of the quarter under examination, in SQL date format ('01-JAN-2000', for example).  This is where I imagine the second macro would come in.

I'm imagining creating a data set with three variables: vsn, start_date, and end_date.  [&vsn] would simply be an integer from 1 to 61, representing each iteration of the loop and providing a unique name to each table.  [&startdt] represents the first day of a quarter, and as you might expect, [&enddt] represents the last.  I have attempted to put together some (untested) code for creating this data set, but I'm not entirely sure how to manage the macro aspect of it, and could use some criticism/suggestions.

do i = 1 to 61 ;
   retain vsn 1 yearinteger 2000 ;
   if mod(i,4) = 1 then do ;
      call symput("startdt"||strip(vsn),"'01-JAN-"||&yearinteger||"'") ;
      call symput("enddt"||strip(vsn),"'31-MAR-"||&yearinteger||"'") ;
      end ;
   elseif mod(i,4) = 2 then do ;
      call symput("startdt"||strip(vsn),"'01-APR-"||&yearinteger||"'") ;
      call symput("enddt"||strip(vsn),"'30-JUN-"||&yearinteger||"'") ;
      end ;
   elseif mod(i,4) = 3 then do ;
      call symput("startdt"||strip(vsn),"'01-JUL-"||&yearinteger||"'") ;
      call symput("enddt"||strip(vsn),"'30-SEP-"||&yearinteger||"'") ;
      end ;
   else do ;
      call symput("startdt"||strip(vsn),"'01-OCT-"||&yearinteger||"'") ;
      call symput("enddt"||strip(vsn),"'31-DEC-"||&yearinteger||"'") ;
      &yearinteger = &yearinteger + 1 ;
      end ;
   &vsn = &vsn + 1 ;
end ;

Surely this is filled with grievous errors, and for that I apologize.  I hope that this gives you a general idea of what I'm trying to do, and I would be extremely grateful for any advice from a more experienced programmer.  I've been banging my head against this for hours, but generally fail to grasp the intricacies of macros.  Thank you in advance for your help, kind person(s).

10 REPLIES 10
Reeza
Super User

Why separate into 61 datasets in the first place, why not do one big query?

Tacos_Tacos_Tacos
Calcite | Level 5

The amount of data is enormous.  One of the tables has in the realm of ~4 billion rows of data, so doing this all in a single query puts far too much strain on my server resources.

Reeza
Super User

Ok...

1. Create  macro to do the extract

2. Create list of dates/qrtrs for parameters

3. Execute using call execute (step2/3 could be one step).

I don't think this will work, but you can debug on your side. In particular make sure the date is being passed to the server in a manner that it expects.

Good luck

%macro extract(qrtr=, start=, end=);

proc sql;

connect to oracle (user=&user. pass=&pass.);

  

create table sastemp.["table"||&qrtr]

   as select *

   from connection to oracle (

  select  a.variables,

   b.variables

           

   from statictable1 b

   left join statictable2 a on b.variable = a.variable

     

   where b.datevariable between [&start] and [&end]

 

) order by variable ;

        disconnect from oracle ;

quit ;

%mend;

data want;

length str $256.;

start_date='01Jan2000'd;

do qrtr=1 to 61;

end_date=intnx('MONTH', start_date, 3, 'b')-1;

str='%extract(qrtr='||put(qrtr, z3.)||', start='||put(start_date,date11.)||", end="||put(end_date, date11.)||');';

output;

start_date=end_date+1;

end;

format start_date end_date date9.;

run;

data _null_;

set want;

call execute(str);

run;

ChrisNZ
Tourmaline | Level 20

Many parameters in play here, but there may be a better way.

Creating new tables still strains the server. Do you really want to duplicate all 4 billions records?

If the source was SAS, much better to point to the original table with where clauses (if table is sorted and sort-validated) or obs= lastobs= options. Since this is Oracle, you are a bit more constrained.

What the first step of your process?

If you can't point to the original data (with a where clause) for this step, then create a view and process the view (provided this data is only read once). If you need the data sorted, better have Oracle sort it.

%macro loopdates(startd=199912, endd=199912);

  %local daten date_yymm daten_endq date_ora1 date_ora2;

  %let daten=%sysfunc(inputn(&startd,yymmn6.));                      

  %do %while(&daten<=%sysfunc(inputn(&endd,yymmn6.)));               

    %let date_yymm =%sysfunc(putn(&daten     ,yyq.)) ;

    %let daten_endq=%sysfunc(intnx(quarter,&daten,0,e));                            

    %let date_ora1 =%str(%')%sysfunc(putn(&daten     ,date9.))%str(%');                             

    %let date_ora2 =%str(%')%sysfunc(putn(&daten_endq,date9.))%str(%');                             

    ...

    create view SASTEMP.TABLE_&date_yymm.  as

    ...

    where b.DATEVARIABLE between &date_ora1. and &date_ora2.

    ...

    %let daten=%sysfunc(intnx(quarter,&daten,1));                  

  %end;

%mend;

%loopdates(startd=200701, endd=201112);

creates:

... create view SASTEMP.TABLE2011Q4  as...where b.DATEVARIABLE between '01OCT2011' and '31DEC2011'   ...

But if you can put the actual processing logic in this macro rather than a view creation, it is even better.

ballardw
Super User

If you have a working macro to do this for ONE of your output data sets then create a SAS data set with the parameters that macro needs. Then you can use Call Execute to pass those parameters to the macro, creating all the remaining needed data sets.

A control set can be made with:

data control;

yearinteger =2000;

do vsn = 1 to 61 ;

    

   if mod(vsn,4) = 1 then do ;

      startdt = cats("'01-JAN-",yearinteger,"'");

      enddt   = cats("'31-MAR-",yearinteger,"'") ;

      end ;

   else if mod(vsn,4) = 2 then do ;

      startdt = cats("'01-APR-",yearinteger,"'");

      enddt   = cats("'31-JUN-",yearinteger,"'") ;

      end ;

   else if mod(vsn,4) = 3 then do ;

      startdt = cats("'01-JUL-",yearinteger,"'");

      enddt   = cats("'31-SEP-",yearinteger,"'") ;

      end ;

   else do ;

      startdt = cats("'01-OCT-",yearinteger,"'");

      enddt   = cats("'31-DEC-",yearinteger,"'") ;

      yearinteger = yearinteger + 1 ;

      end ;

   output;

end ;

run;

A separate dataset lets you verify that the parameters look the way that you want them.

and use something like:

data _null_;

     set control;

     call execute ('%yourmacroname( vsn=' || vsn || ', startdt=' || startdt || ', enddt=' || enddt || ')' );

run;

of course the macro call has to match how you define the macro.

Patrick
Opal | Level 21

You're dealing with serious volumes here. Let's assume that you really need to replicate the data into SAS files. You say that "so doing this all in a single query puts far too much strain on my server resources". I'd ask what happens if you execute your query 61 times and it probably results in full table scans in Oracle every single time? That would be much worse.

Depending on how you need to use your data in SAS creating 61 files makes eventually sense. Have you considered using the SPDE engine?

Also: If your Oracle table really has 4 Billion rows and there is a date column (eg. for transactions), then there is a good chance that the Oracle table is partitioned. If so then it would be really important that you define your query "partition wise". This would make a huge difference.

Make sure you investigate how the actual Oracle DDL looks like and that you run an "Explain" to performance tweak your query. Once you're there then it shouldn't be too hard to set-up some SAS code (with or without macros) to get what you need.

If you can: Share the DDL of your Oracle tables (as attachments) and the exact logic with all the columns you need in SAS. Also let us know if your downstream processing will be month based (so multiple tables would make sense) or not (then rather go for the SPDE engine).

I believe to remember that the SPDE engine also allows for multi-threaded download of data from Oracle. So this is also something we should be looking into.

How long did it take you to create this one single month table (real time)? Is this time multiplied by 61 something acceptable or would we also need to think about running the extraction jobs in parallel? Do you have SAS/Connect licensed at your site?

Tacos_Tacos_Tacos
Calcite | Level 5

Mission accomplished.  Thank you to all who answered, not only for coding suggestions, but also for alternative ways of considering the problem.

Ultimately, I went with a form of ballardw's response - largely because it made the most immediate sense to me.

%macro productloop ;

%let yearinteger = 2000 ;

   %do vsn = 1 %to 61 ;
      data _null_;     
      %if %sysfunc(mod(&vsn,4))= 1 %then %do ;
         call symput("startdt", "'01-JAN-"||strip(&yearinteger.)||"'") ;
         call symput("enddt", "'31-MAR-"||strip(&yearinteger.)||"'") ;
         %let quarter = "Q1" ;
         %let twodigityear = substrn(&yearinteger.,3,2) ;
         call symput("mergeset", "<filepath>"||&twodigityear.||&quarter.) ;
         call symput("g1filename", "<filepath>"||&twodigityear.||&quarter.||"G1") ;
         call symput("g2filename", "<filepath>"||&twodigityear.||&quarter.||"G2") ;
         call symput("g1output", "'<filepath>"||&twodigityear.||&quarter.||"G1.csv'") ;
         call symput("g2output", "'<filepath>"||&twodigityear.||&quarter.||"G2.csv'") ;
         %datagen;
      %end ;
      %else %if %sysfunc(mod(&vsn,4))= 2 %then %do ;
         call symput("startdt", "'01-APR-"||strip(&yearinteger.)||"'") ;
         call symput("enddt", "'30-JUN-"||strip(&yearinteger.)||"'") ;
         %let quarter = "Q2" ;
         %let twodigityear = substrn(&yearinteger.,3,2) ;
         call symput("mergeset", "<filepath>"||&twodigityear.||&quarter.) ;
         call symput("g1filename", "<filepath>"||&twodigityear.||&quarter.||"G1") ;
         call symput("g2filename", "<filepath>"||&twodigityear.||&quarter.||"G2") ;
         call symput("g1output", "'<filepath>"||&twodigityear.||&quarter.||"G1.csv'") ;
         call symput("g2output", "'<filepath>"||&twodigityear.||&quarter.||"G2.csv'") ;
         %datagen;
      %end ;
      %else %if %sysfunc(mod(&vsn,4))= 3 %then %do ;
         call symput("startdt", "'01-JUL-"||strip(&yearinteger.)||"'") ;
         call symput("enddt", "'30-SEP-"||strip(&yearinteger.)||"'") ;
         %let quarter = "Q3" ;
         %let twodigityear = substrn(&yearinteger.,3,2) ;
         call symput("mergeset", "<filepath>"||&twodigityear.||&quarter.) ;
         call symput("g1filename", "<filepath>"||&twodigityear.||&quarter.||"G1") ;
         call symput("g2filename", "<filepath>"||&twodigityear.||&quarter.||"G2") ;
         call symput("g1output", "'<filepath>"||&twodigityear.||&quarter.||"G1.csv'") ;
         call symput("g2output", "'<filepath>"||&twodigityear.||&quarter.||"G2.csv'") ;
         %datagen;
      %end ;
      %else %do ;
         call symput("startdt", "'01-OCT-"||strip(&yearinteger.)||"'") ;
         call symput("enddt", "'31-DEC-"||strip(&yearinteger.)||"'") ;
         %let quarter = "Q4" ;
         %let twodigityear = substrn(&yearinteger.,3,2) ;
         call symput("mergeset", "<filepath>"||&twodigityear.||&quarter.) ;
         call symput("g1filename", "<filepath>"||&twodigityear.||&quarter.||"G1") ;
         call symput("g2filename", "<filepath>"||&twodigityear.||&quarter.||"G2") ;
         call symput("g1output", "'<filepath>"||&twodigityear.||&quarter.||"G1.csv'") ;
         call symput("g2output", "'<filepath>"||&twodigityear.||&quarter.||"G2.csv'") ;
         %datagen;
        
         data _null_ ;
         %let yearinteger = sum(&yearinteger., 1) ;
      %end ;
   %end ;
run;

%mend productloop ;

This is the outer loop, within which I ran the "datagen" macro, which merged tables with proc sql and performed the various calculations I needed to do while using the myriad macro variables defined in said outer loop.  It is perhaps not the most elegant code, but what it lacks in beauty it makes up for in functionality.  A few notable hang-ups:

  • I did not know that the "call symput" function required a preceding "data _null_" declaration, creating a situation in which the first data set would be created without a problem, but subsequent iterations would error out without any explanation as to why within the log.  Fixed by placing a "data _null_" statement at the top of the loop.
  • The difference between "%let" and "call symput" remains somewhat elusive to me, though I had to switch between the two quite a bit before the resulting code behaved as desired.
  • Similarly, while the difference between "if" and "%if" is clear enough (I think), it's not immediately obvious to me why the macro versions of the other commands/functions (%to, %do, %end, etc.) were necessary, aside from the fact that the code wouldn't run otherwise.

Anyway, just wanted to give a final update and a hearty "thank you" for the assistance.  I've got quite a bit to learn yet, but this has taught me a substantial amount.  Cheers.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

+1.  Sounds like a right mess.  I would suggest you go with Patrick's post.  Have a talk with the database experts and ask them to suggest some techniques to dump the database data into the block you need.

Tom
Super User Tom
Super User

If you are going to use a data step to generate the macro variables then why to you need all of the macro %IF statements?

Why not just do that within the data step using normal IF statements?  Normal SAS code is much easier to debug than macro code.

Or alternatively if all you are doing in the CALL SYMPUT is concatenating macro variable values then why do you need the CALL SYMPUT() functions at all?

For example consider the first CALL SYMPUT() function call.


call symput("startdt", "'01-JAN-"||strip(&yearinteger.)||"'") ;


That is the same as


%let STARTDT=%sysfunc(dequote("'01-JAN-&yearinteger'")) ;

Also why are you setting the macro variable YEARINTEGER to have the code to call the SUM() function?  Why not just set it to have the number that would result if you actually did the addition?

Replace


%let yearinteger = sum(&yearinteger., 1) ;


with


%let yearinteger = %eval(&yearinteger + 1) ;

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 2791 views
  • 7 likes
  • 7 in conversation