Help using Base SAS procedures

SQL Loop

Reply
Occasional Contributor
Posts: 19

SQL Loop

Hi,
I'm trying to optimize a SQL loop like this

%do i=1 %to 100 %by 2;

proc sql;
create table table&i as
select x , idx, z, sum(y) as tot&i from database
where idx between &i and %eval(&i+1)
group by x;
quit;
%end;

This loop creates 50 little tables referring to the variable in the where clause. After that I merge all the tables. The time process is very long, probably ralated to the huge database (3.000.000 records, 10GB). How can i avoid the 50 tables?

Thanks in advance

Simone
Super Contributor
Posts: 578

Re: SQL Loop

How is the table DATABASE sorted?
Super User
Posts: 5,427

Re: SQL Loop

How do you merge the tables at the end?
Pls describe the "business problem". Knowing more about what you are trying to accomplish will help finding alternatives.

/Linus
Data never sleeps
Occasional Contributor
Posts: 19

Re: SQL Loop

Hi,
database is not sorted. The code that i have posted is a simpler version of the original (it contains subquery). I'm trying to calculate different "sums" (idx could be a time period).

Thanks
Super User
Posts: 5,427

Re: SQL Loop

This still does not explain why you need to split and merge the data.

/Linus
Data never sleeps
Occasional Contributor
Posts: 19

Re: SQL Loop

Hi Linus,
i split and then merge the data because I don't know if with the proc SQL i can calculate n sum with different where clause.
Super User
Posts: 5,427

Re: SQL Loop

Hmmm...
Your idx column aren't used as group by column nor as a input to a statistical function. This means that your data aren't summarized at all...?
Can't still see why you are using this logic, why you can't use an ordinary sum()/group by? What is your final delivery?

/Linus
Data never sleeps
Occasional Contributor
Posts: 19

Re: SQL Loop

First of all, sorry but i have 2 problem: SAS and english language.
I don't use idx as group because:
idx 1 =period 1
idx 2 =period 2
Data are summarized selecting period 1 and 2, period 3 and 4.....
With 100 idx values i should have 50 different sums...
Super User
Posts: 5,427

Re: SQL Loop

If you run your program, you should probably see something like this in the log:

NOTE: The query requires remerging summary statistics back with the original data.

Which means that your result will have detail data in your output data.

Still knowing exactly what you want, but I assume you want to sum two periods together in each output record.
Try to have some mechanism that maps two periods together. Since I don't know what kind of periods we're talking about, I can't give a complete solution. One way if you could create a SAS format that maps periods pairwise to a single value, and apply it in the SELECT clause. Then you could do like this:

/* Create format before this step */

proc sql;
create table table&i as
select x , put(idx,MyPeriodFmt.) as id, z, sum(y) as tot_y
from database
group by x, calculated id, z;
quit;

If our periods are dates or months you can probably us some SAS functions to group periods direclty in the SQL.

Conclussion, no need for looping and merging data afterwards, whic will probably save a lot of processing time.

/Linus
Data never sleeps
Valued Guide
Posts: 2,177

Re: SQL Loop

> Hi Linus,
> i split and then merge the data because I don't know
> if with the proc SQL i can calculate n sum with
> different where clause.

although you cannot apply different where clauses, you can achieve the effect with the CASE-WHEN-ELSE structure
This example takes the macro loop inside the select statement to create 30 new columns tot1 to tot30.
proc sql;
create table table_all_30 as
select x
, idx, z
%do i= 1 to 30 ;
, case when idx GE &i and idx LE %eval(&i+1)
then sum(y) else 0 end as tot&i
%end
from database
group by x, idx, z
;
quit; naming the new columns tot1 to tot30 ( not idx1 to idx30)

Message was edited by: Peter.C
Super Contributor
Posts: 578

Re: SQL Loop

I think the other commenters are correct..but I believe you would experience significant performance gains if you sorted the table correctly or created appropriate indices.
Super Contributor
Super Contributor
Posts: 365

Re: SQL Loop

Hello HDSimo,

Try to use views instead on your 50 tables. In this case you create a real table only when you merge all views.

Sincerely,
SPR
Super User
Posts: 10,023

Re: SQL Loop

Hi.Maybe you write code like this to avoid 50 dataset.


[pre]
proc sql;
create table result as

select x , idx, z, sum(y) as tot&i from database
where idx between 1 and 2
group by x;


%do i=2 %to 100 %by 2;

union all corresponding

select x , idx, z, sum(y) as tot&i from database
where idx between &i and %eval(&i+1)
group by x;


%end;

quit;



Ksharp
Occasional Contributor
Posts: 19

Re: SQL Loop

Thanks!
A lot of good advices!

Thank you again
Ask a Question
Discussion stats
  • 13 replies
  • 1179 views
  • 0 likes
  • 6 in conversation