Dear experts,
I would like to modify following code in the way that the list of item in IN clause is not explicitly written in code but is only a reference to a column in data-set or array. Is there a similar data-structure as a "vector" in R that can be refered to?
PROC SQL;
CREATE TABLE Tab2 AS
SELECT * FROM Tab1
WHERE Var1 IN (1, 3, 4, 7);
QUIT;
Thank you for help.
Martin
The in operator isn't very efficient for very large subqueries (think super long list of values) but you can also go as follow:
proc sql;
create table tab2 (drop=rn) as
select *
from tab1
where var1 in (select colx from somedataset) /* could be a sub query on another column of tab1 itself, you can also use where conditions on the subquery. The only restriction is that the subquery tied to the in operator has to return a single column */
;
quit;
With very large subqueries, it seems there is a gain in efficiency by doing an inner join instead. You can achieve similar results in a data step using either array _temporary_ or hash tables. The difference being that array/hash tables are all in memory and winds up faster than the in operator but also that you may need to add steps to your code to create the desired subquery as a dataset to populate the array or hash.
Vince
Hi,
If you want to reference to the row num instead of its value you can you the below code:
Proc sql;
create table tab2 (drop=rn) as
( select * from
(select *,monotonic() as rn from Tab1 )
where rn in (1,3,4,7));
Quit;
I hope this will help you.
Thanks
Puneeta
I interpet it is the relative column number, not row number.
You can refer to columns by position in order and group by clauses, but not in the where clause.
So, what is the requirement to do this?
You can write a pre-process or an in-line macro that gets the column name by it's position.
Dear Puneeta, thank you for your fast answer but my problem was that I wanted to have the list of numbers (in this example 1,3,4,7) in a column of the table, NOT in the code. The problem was not about rows. Nevertheless you answer is also interesting. I am sure I would ask it in the future anyway Thank you. Martin
The in operator isn't very efficient for very large subqueries (think super long list of values) but you can also go as follow:
proc sql;
create table tab2 (drop=rn) as
select *
from tab1
where var1 in (select colx from somedataset) /* could be a sub query on another column of tab1 itself, you can also use where conditions on the subquery. The only restriction is that the subquery tied to the in operator has to return a single column */
;
quit;
With very large subqueries, it seems there is a gain in efficiency by doing an inner join instead. You can achieve similar results in a data step using either array _temporary_ or hash tables. The difference being that array/hash tables are all in memory and winds up faster than the in operator but also that you may need to add steps to your code to create the desired subquery as a dataset to populate the array or hash.
Vince
Excellent! The is an answer I needed. Thank you for fast and precise answer. Martin
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.