Solved: Re: PROC SQL WHERE IN clause with a variable / array

MartinNovak · Posted 09-26-2013 04:31 AM

Dear experts,

I would like to modify following code in the way that the list of item in IN clause is not explicitly written in code but is only a reference to a column in data-set or array. Is there a similar data-structure as a "vector" in R that can be refered to?

PROC SQL;

CREATE TABLE Tab2 AS

SELECT * FROM Tab1

WHERE Var1 IN (1, 3, 4, 7);

QUIT;

Thank you for help.

Martin

Vince28_Statcan · Posted 09-26-2013 07:57 AM

The in operator isn't very efficient for very large subqueries (think super long list of values) but you can also go as follow:

proc sql;

create table tab2 (drop=rn) as

select *

from tab1

where var1 in (select colx from somedataset) /* could be a sub query on another column of tab1 itself, you can also use where conditions on the subquery. The only restriction is that the subquery tied to the in operator has to return a single column */

;

quit;

With very large subqueries, it seems there is a gain in efficiency by doing an inner join instead. You can achieve similar results in a data step using either array _temporary_ or hash tables. The difference being that array/hash tables are all in memory and winds up faster than the in operator but also that you may need to add steps to your code to create the desired subquery as a dataset to populate the array or hash.

Vince

View solution in original post

Puneeta_SAS · Posted 09-26-2013 05:32 AM

Hi,

If you want to reference to the row num instead of its value you can you the below code:

Proc sql;

create table tab2 (drop=rn) as

( select * from

(select *,monotonic() as rn from Tab1 )

where rn in (1,3,4,7));

Quit;

I hope this will help you.

Thanks

Puneeta

LinusH · Posted 09-26-2013 05:52 AM

I interpet it is the relative column number, not row number.

You can refer to columns by position in order and group by clauses, but not in the where clause.

So, what is the requirement to do this?

You can write a pre-process or an in-line macro that gets the column name by it's position.

Data never sleeps

MartinNovak · Posted 09-26-2013 08:58 AM

Dear Puneeta, thank you for your fast answer but my problem was that I wanted to have the list of numbers (in this example 1,3,4,7) in a column of the table, NOT in the code. The problem was not about rows. Nevertheless you answer is also interesting. I am sure I would ask it in the future anyway Thank you. Martin

Vince28_Statcan · Posted 09-26-2013 07:57 AM

The in operator isn't very efficient for very large subqueries (think super long list of values) but you can also go as follow:

proc sql;

create table tab2 (drop=rn) as

select *

from tab1

where var1 in (select colx from somedataset) /* could be a sub query on another column of tab1 itself, you can also use where conditions on the subquery. The only restriction is that the subquery tied to the in operator has to return a single column */

;

quit;

With very large subqueries, it seems there is a gain in efficiency by doing an inner join instead. You can achieve similar results in a data step using either array _temporary_ or hash tables. The difference being that array/hash tables are all in memory and winds up faster than the in operator but also that you may need to add steps to your code to create the desired subquery as a dataset to populate the array or hash.

Vince

MartinNovak · Posted 09-26-2013 08:53 AM

Excellent! The is an answer I needed. Thank you for fast and precise answer. Martin

PROC SQL WHERE IN clause with a variable / array

Re: PROC SQL WHERE IN clause with a variable / array

Re: PROC SQL WHERE IN clause with a variable / array

Re: PROC SQL WHERE IN clause with a variable / array

Re: PROC SQL WHERE IN clause with a variable / array

Re: PROC SQL WHERE IN clause with a variable / array

Re: PROC SQL WHERE IN clause with a variable / array

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away