Hi,
I have a process where I do a Cartesian product of 2 tables (one with data and one with parameters) using proc sql and then I use data step to apply the needed filters so that at the end only the lines where the data matches the parameter remain.
As you can imagine this is a very inefficient process and so I'm trying to use a datastep approach to apply the filters during the cross join so that the process becomes more efficient. But unfortunately it doesn't seem to be working since and don't get the same results in both cases. In fact, instead of getting something between 1.000 - 10.000 observations, I get only 25!
I'm using the code mentioned here http://support.sas.com/kb/24/652.html
Here is a sample of what I'm doing:
data campanha_000_testes;
set base
do i=1 to n;
set parameter point=i nobs=n;
if id = 000 and SYSTEM NE 'BBBB' then delete;
if not missing('oferta Origem'n) then
do;
if compress('oferta Origem'n) = compress(pack_dsc);
end;
if find(RP_ID_EXCLUDE,compress(put(pack_cod,10.))) then delete;
end;
if not missing(Desconto_mens_max) then
do;
if Desconto_mens_min <= Beneficio <=Desconto_mens_max;
end;
if not missing(mensalidade_maxima) then
do;
*distinguir a abordagem;
if abordagem in ('Client','Conta') and id ne 303 then
do;
if mensalidade_minima < mens_liq_conta <= mensalidade_maxima;
end;
else if compress(abordagem) in ('Client','Conta') and id = 303 then
do;
if mensalidade_minima < mens_liq_im_conta <= mensalidade_maxima;
end;
else if abordagem = 'NIF' then
do;
if mensalidade_minima < mens_liq_NIF <= mensalidade_maxima;
end;
else
do;
if Mensalidade_Minima < Mensalidade_Liq_Siva <= Mensalidade_maxima;
end;
end;
output;
end;
run;
Any ideas on how to perform this?
Thanks in advance
I've included part of your conditions as a where in the SQL:
proc sql;
create table campaign_&id. as
select *
from work.base
cross join parametro
where
(id = 000 or SYSTEM NE 'BBBB' or not find(RP_ID_EXCLUDE,compress(put(pack_cod,10.))))
and (
(missing('oferta Origem'n) or compress('oferta Origem'n) = compress(pack_dsc))
or
(missing(Desconto_mens_max) or (Desconto_mens_min > Beneficio or Beneficio > Desconto_mens_max))
)
;
quit;
You need to apply DeMorgans law repeatedly when changing excluding conditions to including conditions.
PS your data step has a surplus end; after the second delete statement. Or there's a missing if .... do.
Could you please post the original code with SQL and data step? I think it should be possible to include the condition from the data step into the SQL.
Hi @Kurt_Bremser,
In the original code, the data step part is almost identical I just do a simple Cartesian product with proc sql.
Here is a sample: (I have much more conditions than these)
proc sql;
create table campaign_&id. as
select *
from work.base
cross join parametro;
quit;
data campanha_000_testes;
set ampaign_&id.;
if id = 000 and SYSTEM NE 'BBBB' then delete;
if not missing('oferta Origem'n) then
do;
if compress('oferta Origem'n) = compress(pack_dsc);
end;
if find(RP_ID_EXCLUDE,compress(put(pack_cod,10.))) then delete;
end;
if not missing(Desconto_mens_max) then
do;
if Desconto_mens_min <= Beneficio <=Desconto_mens_max;
end;
if not missing(mensalidade_maxima) then
do;
*distinguir a abordagem;
if abordagem in ('Client','Conta') and id ne 303 then
do;
if mensalidade_minima < mens_liq_conta <= mensalidade_maxima;
end;
else if compress(abordagem) in ('Client','Conta') and id = 303 then
do;
if mensalidade_minima < mens_liq_im_conta <= mensalidade_maxima;
end;
else if abordagem = 'NIF' then
do;
if mensalidade_minima < mens_liq_NIF <= mensalidade_maxima;
end;
else
do;
if Mensalidade_Minima < Mensalidade_Liq_Siva <= Mensalidade_maxima;
end;
end;
run;
I've included part of your conditions as a where in the SQL:
proc sql;
create table campaign_&id. as
select *
from work.base
cross join parametro
where
(id = 000 or SYSTEM NE 'BBBB' or not find(RP_ID_EXCLUDE,compress(put(pack_cod,10.))))
and (
(missing('oferta Origem'n) or compress('oferta Origem'n) = compress(pack_dsc))
or
(missing(Desconto_mens_max) or (Desconto_mens_min > Beneficio or Beneficio > Desconto_mens_max))
)
;
quit;
You need to apply DeMorgans law repeatedly when changing excluding conditions to including conditions.
PS your data step has a surplus end; after the second delete statement. Or there's a missing if .... do.
my problem is that I need to only apply filters if they are not missing, and the data step aproach seemed a better way to avoid adding so many logical conditions but I'll try implementing it that way then...
Another reason I was trying to implement the code this way is because I understand the way data is procesed in data step (as much as one can understand during 4 months of work xD) and thought it would be a very big improvement in performance.
I am not familiar with the way sql processes data, will this augment the performance significantly?
Just by curiosity, do you know why my code isn't working properly?
Your best method is to start simple (with a single condition) and add conditions, so you can watch the progress in terms of growing in/exclusion.
Trying to solve a problem in a "giant leap" fashion usually ends with pain. Use small steps instead.
If tests are taking too much time/too much space, create subsets of your datasets for testing first.
BTW if I think that applying De Morgan's law to
id = 000 and SYSTEM NE 'BBBB' then delete
we should get
(id ^= 320 or SOURCE_SYSTEM_DSC ='BSCS')
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.