Hi and sorry if my english isn't good enought,
I have 8 sas programs how takes 9 to 11 hours for each one (manipulate oracle database), so i have to run all this programs in parallel to optimize time.
So what should i do ?
(im working in SAS EG 4.3)
Hi,
Some tips:
1) Do your database admin on the database, it will be quicker and safer.
2) Optimise your process
I doubt you will be able to parallel run them. The reason being, whilst you may be able to get mutiple SAS sessions going, the database will slow it down, imagine if one program is changing a table and so is another, concurrency becomes an issue. What are you doing that takes 9-11 hours each one? Re-assess your process, optimise things etc.
i tried to execute programs in multiple sas sessions and it works very good, but i should to launch every one manually. Now i want to automate that: like launch a macro or something that run programs in parallel.
I think you are fooling yourself if you think that your computer will do the work faster performing several tasks in several SAS windows in parallel, compared to sequential. There is only so much your CPU can do in any period of time!
You can set up batch jobs, depending on your operating system, that can start multiple SAS sessions.
SAS EG is installed on enterprise server, its better in performene then my laptop of course !!
Please, How can i create such a batch job (if you have any example)?
Please state your operating system
8 hours for one Oracle job? There must be something wrong in your logical approach and coding.
No matter how fast hardware will be it is never unlimited. As a result it is always possible to code that badly it will last unacceptable long.
The bad thing with that is running in parallel will make all that even worse as being a thrashing effect. A serial process will be faster than parallel preventing the overload.
Go for the effort on reviewing and optimizing the code.
First of all, have your DBA people look at your Oracle pass-through SQL and check for optimizations. 9-11 hours for a job on Oracle is extreme. Or did you not use SQL pass-through? That would slow any operation down.
Second, parallelizing jobs that use the same infrastructure (in this case, your database) is usually counterproductive; 8 1-hour jobs will consume 10+ hours when run in parallel. One of the duties of datacenter people is the clever schedulung of jobs to avoid this effect.
Have you already run 2 of those jobs concurrently from 2 EG sessions and timed the outcome?
%Include "/SAS92/data/segchurn/sas2/autoexec_dev.sas" ;
Proc Sql ;
Connect To Oracle (User = &UserOracle Pass = &PassOracle Path = &PathOracle) ;
execute (drop table bi_prod.tbl_act_sas) by oracle ;
execute (create table bi_prod.tbl_act_sas as
select cod_linea , fecha_alta from DW_D_LINEA , bi_prod.param_date_act a where cod_sistema='2'
and flag_reutilizado= 0 and fecha_alta >= dd and fecha_alta <df+1 ) By oracle ;
execute(truncate table bi_prod.tbl_rec_sas ) by oracle ;
execute (insert into bi_prod.tbl_rec_sas (select SUM(DW_F_RECARGAS.IMPORTE_RECARGA) mnt_recharge,cod_linea from dw_f_recargas ,
DW_D_ORIGEN_RECARGA,bi_prod.param_date_analyse a where DW_D_ORIGEN_RECARGA.ID_ORIGEN_RECARGA=DW_F_RECARGAS.ID_ORIGEN_RECARGA AND dw_f_recargas.cod_sistema='2' and DW_D_ORIGEN_RECARGA.DES_ORIGEN_RECARGA IN ('Call Center', 'BTU', 'IN', 'ETU', 'E-payment') and cod_linea in (select cod_linea from bi_prod.tbl_act_sas) and id_dia >= dd and id_dia <df+1 group by cod_linea) ) by oracle ;
execute(commit) by oracle;
execute (truncate table bi_prod.tbl_out_sas) by oracle ;
execute (insert into bi_prod.tbl_out_sas (
select SUM(dw_f_prepago.importe) mnt_out,telefono_origen from
dw_f_prepago ,bi_prod.param_date_analyse a
where
telefono_origen in (select cod_linea from bi_prod.tbl_act_sas) and
id_dia >= dd and id_dia <df+1
group by telefono_origen ) ) by oracle ;
execute(commit) by oracle ;
execute (truncate table bi_prod.tbl_nav_sas) by oracle ;
execute (insert into bi_prod.tbl_nav_sas (
select SUM(volume_up+volume_down) vol,num_linea from
dw_f_navigation ,bi_prod.param_date_analyse a
where cod_sistema='2' and num_linea in (select cod_linea from bi_prod.tbl_act_sas) and
id_dia >= dd and id_dia <df+1
group by num_linea ) by oracle ;
execute(commit) ;
execute (truncate table bi_prod.tbl_inter_sas) by oracle ;
execute ( insert into bi_prod.tbl_inter_sas (
select SUM(ingresos) mou_in,telefono_destino from
dw_f_interconexion ,bi_prod.param_date_analyse a
where cod_sistema='2' and telefono_destino in (select cod_linea from bi_prod.tbl_act_sas) and
id_dia >= dd and id_dia <df+1
group by telefono_destino ) ) by oracle ;
execute(commit) ;
Create Table sasuser.analyse_identif As
Select *
From Connection To Oracle
(
select a.cod_linea , a.fecha_alta date_act ,b.vol ,c.mou_in,d.mnt_out mou_out,e.mnt_recharge from
bi_prod.tbl_act_sas a left outer join bi_prod.tbl_nav_sas b on a.cod_linea = b.num_linea
left outer join bi_prod.tbl_inter_sas c on a.cod_linea = c.telefono_destino
left outer join bi_prod.tbl_out_sas d on a.cod_linea = d.telefono_origen
left outer join bi_prod.tbl_rec_sas e on a.cod_linea = e.cod_linea
) ;
Disconnect From Oracle ;
Quit ;
But it takes a lot (2 days and it does not finish) so tried to optomize it, the idea was to segment the code and run those of same step in parallel and it wored. Now i want to automate that. this my new code:
/* création des tables 1,..,4 */
execute (create table bi_prod.tbl_nav_sas_1 as select * from bi_prod.tbl_nav_sas) by oracle;
execute (create table bi_prod.tbl_nav_sas_2 as select * from bi_prod.tbl_nav_sas) by oracle;
execute (create table bi_prod.tbl_nav_sas_3 as select * from bi_prod.tbl_nav_sas) by oracle;
execute (create table bi_prod.tbl_nav_sas_4 as select * from bi_prod.tbl_nav_sas) by oracle;
execute (create table bi_prod.tbl_nav_sas_join_1 as select * from bi_prod.tbl_nav_sas) by oracle;
execute (create table bi_prod.tbl_nav_sas_join_2 as select * from bi_prod.tbl_nav_sas) by oracle;
/* 1 */
execute (truncate table bi_prod.tbl_nav_sas_1) by oracle ;
execute (insert into bi_prod.tbl_nav_sas_1 (
select SUM(volume_up+volume_down) vol, num_linea from
dw_f_navigation ,bi_prod.param_date_analyse a
where cod_sistema='2'
group by num_linea )) by oracle ;
execute(commit) by oracle;
/* 2 */
execute (truncate table bi_prod.tbl_nav_sas_2) by oracle ;
execute (insert into bi_prod.tbl_nav_sas_2 (
select SUM(volume_up+volume_down) vol, num_linea from
dw_f_navigation ,bi_prod.param_date_analyse a
where num_linea in (select cod_linea from bi_prod.tbl_act_sas)
group by num_linea )) by oracle ;
execute(commit) by oracle;
/* 3 */
execute (truncate table bi_prod.tbl_nav_sas_3) by oracle ;
execute (insert into bi_prod.tbl_nav_sas_3 (
select SUM(volume_up+volume_down) vol, num_linea from
dw_f_navigation ,bi_prod.param_date_analyse a
where id_dia >= dd
group by num_linea )) by oracle ;
execute(commit) by oracle;
/* 4 */
execute (truncate table bi_prod.tbl_nav_sas_4) by oracle ;
execute (insert into bi_prod.tbl_nav_sas_4 (
select SUM(volume_up+volume_down) vol, num_linea from
dw_f_navigation ,bi_prod.param_date_analyse a
where id_dia <df+1
group by num_linea )) by oracle ;
execute(commit) by oracle;
/* jointure */
/* 1 */
execute (truncate table bi_prod.tbl_nav_sas_join_1) by oracle ;
execute(insert into bi_prod.tbl_nav_sas_join_1 (SELECT *
FROM bi_prod.tbl_nav_sas_1 INNER JOIN bi_prod.tbl_nav_sas_2
ON bi_prod.tbl_nav_sas_1.num_linea = bi_prod.tbl_nav_sas_2.num_linea)) by oracle;
execute(commit) by oracle;
/* 2 */
execute (truncate table bi_prod.tbl_nav_sas_join_2) by oracle ;
execute(insert into bi_prod.tbl_nav_sas_join_2 (SELECT *
FROM bi_prod.tbl_nav_sas_3 INNER JOIN bi_prod.tbl_nav_sas_4
ON bi_prod.tbl_nav_sas_3.num_linea = bi_prod.tbl_nav_sas_4.num_linea)) by oracle;
execute(commit) by oracle;
/* finale */
execute (truncate table bi_prod.tbl_nav_sas) by oracle ;
execute (insert into bi_prod.tbl_nav_sas (
select * from bi_prod.tbl_nav_sas_join_1 inner join bi_prod.tbl_nav_sas_join_2
on bi_prod.tbl_nav_sas_join_1.num_linea = bi_prod.tbl_nav_sas_join_1.num_linea)
group by num_linea ) by oracle ;
execute(commit) by oracle;
Now i can execute my task in 12 hours earlier.
It looks like your DB people are trying to wipe their hands of the matter. For example, why is the above deleting tables, adn then recreating them each time you startup? Why do they not create views (which are bits of code rather than the data), thus you pass in your query to get the data, the view then executes to get the data together and sends back to you. All processing done on the DB server for speed and effieciency. The table doesn't need to be removed/created and so is reponsive to changes on the DB.
I really don't know about the efficiency of inserts in Oracle, but I would at least try to unload the selects into SAS tables and create the results in SAS using SAS methods, just to test the relative efficiency.
I know from experience with DB/2 that SAS on much "weaker" servers can run circles around (even mainframe) DB systems in certain cases.
You got the code form your DBA, please review the SAS documentation that is intended to have DBA Oracle knowledge.
SAS/ACCESS(R) 9.4 for Relational Databases: Reference, Sixth Edition This one starts with a bufzise to think about is 250 records good, or is it too small and needing more.
SAS/ACCESS(R) 9.4 for Relational Databases: Reference, Sixth Edition Is about insertbuf updatebuff readbuff.
SAS/ACCESS(R) 9.4 for Relational Databases: Reference, Sixth Edition bulk loading is the SQL-loader look at dbsliceparms and more (parallel inside Oracle)
http://support.sas.com/resources/papers/proceedings13/072-2013.pdf Id did not see any of these performance options in your code.
The option to unload a database process it in smart in SAS and reload can be faster as being optimized and tuned for serial batch processing.
The nature of random IO intermediate tables logging en journaling with all commits of an OLTP are adding that much overhead, resulting in a very slow processing
Thanks Karman, i tried to unload datasets in SAS, it takes less time than the first.
Thanks every body.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.