BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Ricardo_Neves
Obsidian | Level 7

Hi,

I have a process where I do a Cartesian product of 2 tables  (one with data and one with parameters) using proc sql and then I use data step to apply the needed filters so that at the end only the lines where the data matches the parameter remain.

As you can imagine this is a very inefficient process and so I'm trying to use a datastep approach  to apply the filters during the cross join so that the process becomes more efficient. But unfortunately it doesn't seem to be working since and don't get the same results in both cases. In fact, instead of getting something between 1.000 - 10.000 observations, I get only 25!

 

I'm using the code mentioned here http://support.sas.com/kb/24/652.html

Here is a sample of what I'm doing:

 

    data campanha_000_testes;
    set base
      do i=1 to n;

        set parameter point=i nobs=n;


    if id = 000 and SYSTEM NE 'BBBB' then delete;

        
            if not missing('oferta Origem'n) then
                do;
                    if compress('oferta Origem'n) = compress(pack_dsc);
                end;

                    if find(RP_ID_EXCLUDE,compress(put(pack_cod,10.))) then delete;
                end;

            if not missing(Desconto_mens_max) then
                do;
                    if Desconto_mens_min <= Beneficio <=Desconto_mens_max;
                end;

            if not missing(mensalidade_maxima) then
                do;
                    *distinguir a abordagem;
                    if abordagem in ('Client','Conta') and id ne 303 then
                        do;
                            if mensalidade_minima < mens_liq_conta <= mensalidade_maxima;
                        end;
                    else if compress(abordagem) in ('Client','Conta') and id = 303 then
                        do;
                            if mensalidade_minima < mens_liq_im_conta <= mensalidade_maxima;
                        end;
                    else if abordagem = 'NIF' then
                        do;
                            if mensalidade_minima < mens_liq_NIF <= mensalidade_maxima;
                        end;
                    else
                        do;
                            if Mensalidade_Minima < Mensalidade_Liq_Siva <= Mensalidade_maxima;
                        end;
                end;

            output;

      end;

        run;


Any ideas on how to perform this?


Thanks in advance

 

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

I've included part of your conditions as a where in the SQL:

proc sql;
create table campaign_&id. as
select *
from work.base 
cross join parametro
where
  (id = 000 or SYSTEM NE 'BBBB' or not find(RP_ID_EXCLUDE,compress(put(pack_cod,10.))))
  and (
    (missing('oferta Origem'n) or compress('oferta Origem'n) = compress(pack_dsc))
    or
    (missing(Desconto_mens_max) or (Desconto_mens_min > Beneficio or Beneficio > Desconto_mens_max))
  )
;
quit;

You need to apply DeMorgans law repeatedly when changing excluding conditions to including conditions.

 

PS your data step has a surplus end; after the second delete statement. Or there's a missing if .... do.

View solution in original post

7 REPLIES 7
Ricardo_Neves
Obsidian | Level 7

Hi @Kurt_Bremser,

 

In the original code, the data step part is almost identical I just do a simple Cartesian product with proc sql.

 

Here is a sample: (I have much more conditions than these)

 

proc sql;
create table campaign_&id. as
select *
from work.base 
cross join parametro;
quit;

	data campanha_000_testes;
	set ampaign_&id.;
if id = 000 and SYSTEM NE 'BBBB' then delete;
	
		if not missing('oferta Origem'n) then
			do;
				if compress('oferta Origem'n) = compress(pack_dsc);
			end;

				if find(RP_ID_EXCLUDE,compress(put(pack_cod,10.))) then delete;
			end;

		if not missing(Desconto_mens_max) then
			do;
				if Desconto_mens_min <= Beneficio <=Desconto_mens_max;
			end;

		if not missing(mensalidade_maxima) then
			do;
				*distinguir a abordagem;
				if abordagem in ('Client','Conta') and id ne 303 then
					do;
						if mensalidade_minima < mens_liq_conta <= mensalidade_maxima;
					end;
				else if compress(abordagem) in ('Client','Conta') and id = 303 then
					do;
						if mensalidade_minima < mens_liq_im_conta <= mensalidade_maxima;
					end;
				else if abordagem = 'NIF' then
					do;
						if mensalidade_minima < mens_liq_NIF <= mensalidade_maxima;
					end;
				else
					do;
						if Mensalidade_Minima < Mensalidade_Liq_Siva <= Mensalidade_maxima;
					end;
			end;



	run;
Kurt_Bremser
Super User

I've included part of your conditions as a where in the SQL:

proc sql;
create table campaign_&id. as
select *
from work.base 
cross join parametro
where
  (id = 000 or SYSTEM NE 'BBBB' or not find(RP_ID_EXCLUDE,compress(put(pack_cod,10.))))
  and (
    (missing('oferta Origem'n) or compress('oferta Origem'n) = compress(pack_dsc))
    or
    (missing(Desconto_mens_max) or (Desconto_mens_min > Beneficio or Beneficio > Desconto_mens_max))
  )
;
quit;

You need to apply DeMorgans law repeatedly when changing excluding conditions to including conditions.

 

PS your data step has a surplus end; after the second delete statement. Or there's a missing if .... do.

Ricardo_Neves
Obsidian | Level 7

my problem is that I need to only apply filters if they are not missing, and the data step aproach seemed a better way to avoid adding so many logical conditions but I'll try implementing it that way then...

 

Another reason I was trying to implement the code this way is because I understand the way data is procesed in data step (as much as one can understand during 4 months of work xD) and thought it would be a very big improvement in performance.

I am not familiar with the way sql processes data, will this augment the performance significantly?

 

Just by curiosity, do you know why my code isn't working properly?

Kurt_Bremser
Super User

Your best method is to start simple (with a single condition) and add conditions, so you can watch the progress in terms of growing in/exclusion.

Trying to solve a problem in a "giant leap" fashion usually ends with pain. Use small steps instead.

 

If tests are taking too much time/too much space, create subsets of your datasets for testing first.

 

Ricardo_Neves
Obsidian | Level 7
thanks for the advice, I'll do just that 🙂
Ricardo_Neves
Obsidian | Level 7

BTW if I think that applying De Morgan's law to

 id = 000 and SYSTEM NE 'BBBB' then delete

we should get

(id ^= 320 or SOURCE_SYSTEM_DSC ='BSCS')

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1964 views
  • 0 likes
  • 2 in conversation