DATA Step, Macro, Functions and more

Cartesian Product with filters using Datastep

Accepted Solution Solved
Reply
Contributor
Posts: 25
Accepted Solution

Cartesian Product with filters using Datastep

Hi,

I have a process where I do a Cartesian product of 2 tables  (one with data and one with parameters) using proc sql and then I use data step to apply the needed filters so that at the end only the lines where the data matches the parameter remain.

As you can imagine this is a very inefficient process and so I'm trying to use a datastep approach  to apply the filters during the cross join so that the process becomes more efficient. But unfortunately it doesn't seem to be working since and don't get the same results in both cases. In fact, instead of getting something between 1.000 - 10.000 observations, I get only 25!

 

I'm using the code mentioned here http://support.sas.com/kb/24/652.html

Here is a sample of what I'm doing:

 

    data campanha_000_testes;
    set base
      do i=1 to n;

        set parameter point=i nobs=n;


    if id = 000 and SYSTEM NE 'BBBB' then delete;

        
            if not missing('oferta Origem'n) then
                do;
                    if compress('oferta Origem'n) = compress(pack_dsc);
                end;

                    if find(RP_ID_EXCLUDE,compress(put(pack_cod,10.))) then delete;
                end;

            if not missing(Desconto_mens_max) then
                do;
                    if Desconto_mens_min <= Beneficio <=Desconto_mens_max;
                end;

            if not missing(mensalidade_maxima) then
                do;
                    *distinguir a abordagem;
                    if abordagem in ('Client','Conta') and id ne 303 then
                        do;
                            if mensalidade_minima < mens_liq_conta <= mensalidade_maxima;
                        end;
                    else if compress(abordagem) in ('Client','Conta') and id = 303 then
                        do;
                            if mensalidade_minima < mens_liq_im_conta <= mensalidade_maxima;
                        end;
                    else if abordagem = 'NIF' then
                        do;
                            if mensalidade_minima < mens_liq_NIF <= mensalidade_maxima;
                        end;
                    else
                        do;
                            if Mensalidade_Minima < Mensalidade_Liq_Siva <= Mensalidade_maxima;
                        end;
                end;

            output;

      end;

        run;


Any ideas on how to perform this?


Thanks in advance

 


Accepted Solutions
Solution
‎10-12-2017 12:09 PM
Super User
Posts: 9,941

Re: Cartesian Product with filters using Datastep

Posted in reply to Ricardo_Neves

I've included part of your conditions as a where in the SQL:

proc sql;
create table campaign_&id. as
select *
from work.base 
cross join parametro
where
  (id = 000 or SYSTEM NE 'BBBB' or not find(RP_ID_EXCLUDE,compress(put(pack_cod,10.))))
  and (
    (missing('oferta Origem'n) or compress('oferta Origem'n) = compress(pack_dsc))
    or
    (missing(Desconto_mens_max) or (Desconto_mens_min > Beneficio or Beneficio > Desconto_mens_max))
  )
;
quit;

You need to apply DeMorgans law repeatedly when changing excluding conditions to including conditions.

 

PS your data step has a surplus end; after the second delete statement. Or there's a missing if .... do.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code

View solution in original post


All Replies
Super User
Posts: 9,941

Re: Cartesian Product with filters using Datastep

Posted in reply to Ricardo_Neves

Could you please post the original code with SQL and data step? I think it should be possible to include the condition from the data step into the SQL.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Contributor
Posts: 25

Re: Cartesian Product with filters using Datastep

Posted in reply to KurtBremser

Hi @KurtBremser,

 

In the original code, the data step part is almost identical I just do a simple Cartesian product with proc sql.

 

Here is a sample: (I have much more conditions than these)

 

proc sql;
create table campaign_&id. as
select *
from work.base 
cross join parametro;
quit;

	data campanha_000_testes;
	set ampaign_&id.;
if id = 000 and SYSTEM NE 'BBBB' then delete;
	
		if not missing('oferta Origem'n) then
			do;
				if compress('oferta Origem'n) = compress(pack_dsc);
			end;

				if find(RP_ID_EXCLUDE,compress(put(pack_cod,10.))) then delete;
			end;

		if not missing(Desconto_mens_max) then
			do;
				if Desconto_mens_min <= Beneficio <=Desconto_mens_max;
			end;

		if not missing(mensalidade_maxima) then
			do;
				*distinguir a abordagem;
				if abordagem in ('Client','Conta') and id ne 303 then
					do;
						if mensalidade_minima < mens_liq_conta <= mensalidade_maxima;
					end;
				else if compress(abordagem) in ('Client','Conta') and id = 303 then
					do;
						if mensalidade_minima < mens_liq_im_conta <= mensalidade_maxima;
					end;
				else if abordagem = 'NIF' then
					do;
						if mensalidade_minima < mens_liq_NIF <= mensalidade_maxima;
					end;
				else
					do;
						if Mensalidade_Minima < Mensalidade_Liq_Siva <= Mensalidade_maxima;
					end;
			end;



	run;
Solution
‎10-12-2017 12:09 PM
Super User
Posts: 9,941

Re: Cartesian Product with filters using Datastep

Posted in reply to Ricardo_Neves

I've included part of your conditions as a where in the SQL:

proc sql;
create table campaign_&id. as
select *
from work.base 
cross join parametro
where
  (id = 000 or SYSTEM NE 'BBBB' or not find(RP_ID_EXCLUDE,compress(put(pack_cod,10.))))
  and (
    (missing('oferta Origem'n) or compress('oferta Origem'n) = compress(pack_dsc))
    or
    (missing(Desconto_mens_max) or (Desconto_mens_min > Beneficio or Beneficio > Desconto_mens_max))
  )
;
quit;

You need to apply DeMorgans law repeatedly when changing excluding conditions to including conditions.

 

PS your data step has a surplus end; after the second delete statement. Or there's a missing if .... do.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Contributor
Posts: 25

Re: Cartesian Product with filters using Datastep

[ Edited ]
Posted in reply to KurtBremser

my problem is that I need to only apply filters if they are not missing, and the data step aproach seemed a better way to avoid adding so many logical conditions but I'll try implementing it that way then...

 

Another reason I was trying to implement the code this way is because I understand the way data is procesed in data step (as much as one can understand during 4 months of work xD) and thought it would be a very big improvement in performance.

I am not familiar with the way sql processes data, will this augment the performance significantly?

 

Just by curiosity, do you know why my code isn't working properly?

Super User
Posts: 9,941

Re: Cartesian Product with filters using Datastep

Posted in reply to Ricardo_Neves

Your best method is to start simple (with a single condition) and add conditions, so you can watch the progress in terms of growing in/exclusion.

Trying to solve a problem in a "giant leap" fashion usually ends with pain. Use small steps instead.

 

If tests are taking too much time/too much space, create subsets of your datasets for testing first.

 

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Contributor
Posts: 25

Re: Cartesian Product with filters using Datastep

Posted in reply to KurtBremser
thanks for the advice, I'll do just that Smiley Happy
Contributor
Posts: 25

Re: Cartesian Product with filters using Datastep

Posted in reply to KurtBremser

BTW if I think that applying De Morgan's law to

 id = 000 and SYSTEM NE 'BBBB' then delete

we should get

(id ^= 320 or SOURCE_SYSTEM_DSC ='BSCS')
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 214 views
  • 0 likes
  • 2 in conversation