BookmarkSubscribeRSS Feed
HDSimo
Calcite | Level 5
Hi,
I'm trying to optimize a SQL loop like this

%do i=1 %to 100 %by 2;

proc sql;
create table table&i as
select x , idx, z, sum(y) as tot&i from database
where idx between &i and %eval(&i+1)
group by x;
quit;
%end;

This loop creates 50 little tables referring to the variable in the where clause. After that I merge all the tables. The time process is very long, probably ralated to the huge database (3.000.000 records, 10GB). How can i avoid the 50 tables?

Thanks in advance

Simone
13 REPLIES 13
DBailey
Lapis Lazuli | Level 10
How is the table DATABASE sorted?
LinusH
Tourmaline | Level 20
How do you merge the tables at the end?
Pls describe the "business problem". Knowing more about what you are trying to accomplish will help finding alternatives.

/Linus
Data never sleeps
HDSimo
Calcite | Level 5
Hi,
database is not sorted. The code that i have posted is a simpler version of the original (it contains subquery). I'm trying to calculate different "sums" (idx could be a time period).

Thanks
LinusH
Tourmaline | Level 20
This still does not explain why you need to split and merge the data.

/Linus
Data never sleeps
HDSimo
Calcite | Level 5
Hi Linus,
i split and then merge the data because I don't know if with the proc SQL i can calculate n sum with different where clause.
LinusH
Tourmaline | Level 20
Hmmm...
Your idx column aren't used as group by column nor as a input to a statistical function. This means that your data aren't summarized at all...?
Can't still see why you are using this logic, why you can't use an ordinary sum()/group by? What is your final delivery?

/Linus
Data never sleeps
HDSimo
Calcite | Level 5
First of all, sorry but i have 2 problem: SAS and english language.
I don't use idx as group because:
idx 1 =period 1
idx 2 =period 2
Data are summarized selecting period 1 and 2, period 3 and 4.....
With 100 idx values i should have 50 different sums...
LinusH
Tourmaline | Level 20
If you run your program, you should probably see something like this in the log:

NOTE: The query requires remerging summary statistics back with the original data.

Which means that your result will have detail data in your output data.

Still knowing exactly what you want, but I assume you want to sum two periods together in each output record.
Try to have some mechanism that maps two periods together. Since I don't know what kind of periods we're talking about, I can't give a complete solution. One way if you could create a SAS format that maps periods pairwise to a single value, and apply it in the SELECT clause. Then you could do like this:

/* Create format before this step */

proc sql;
create table table&i as
select x , put(idx,MyPeriodFmt.) as id, z, sum(y) as tot_y
from database
group by x, calculated id, z;
quit;

If our periods are dates or months you can probably us some SAS functions to group periods direclty in the SQL.

Conclussion, no need for looping and merging data afterwards, whic will probably save a lot of processing time.

/Linus
Data never sleeps
Peter_C
Rhodochrosite | Level 12
> Hi Linus,
> i split and then merge the data because I don't know
> if with the proc SQL i can calculate n sum with
> different where clause.

although you cannot apply different where clauses, you can achieve the effect with the CASE-WHEN-ELSE structure
This example takes the macro loop inside the select statement to create 30 new columns tot1 to tot30.
proc sql;
create table table_all_30 as
select x
, idx, z
%do i= 1 to 30 ;
, case when idx GE &i and idx LE %eval(&i+1)
then sum(y) else 0 end as tot&i
%end
from database
group by x, idx, z
;
quit; naming the new columns tot1 to tot30 ( not idx1 to idx30)

Message was edited by: Peter.C
DBailey
Lapis Lazuli | Level 10
I think the other commenters are correct..but I believe you would experience significant performance gains if you sorted the table correctly or created appropriate indices.
SPR
Quartz | Level 8 SPR
Quartz | Level 8
Hello HDSimo,

Try to use views instead on your 50 tables. In this case you create a real table only when you merge all views.

Sincerely,
SPR
Ksharp
Super User
Hi.Maybe you write code like this to avoid 50 dataset.


[pre]
proc sql;
create table result as

select x , idx, z, sum(y) as tot&i from database
where idx between 1 and 2
group by x;


%do i=2 %to 100 %by 2;

union all corresponding

select x , idx, z, sum(y) as tot&i from database
where idx between &i and %eval(&i+1)
group by x;


%end;

quit;



Ksharp
HDSimo
Calcite | Level 5
Thanks!
A lot of good advices!

Thank you again

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 3518 views
  • 0 likes
  • 6 in conversation