Hello!
I do have a code:
proc sql; create table t1 (id int, name char(30)); insert into t1 values (1, 'Peter') values (2, 'Piper') values (3, 'picked') values (4, 'a peck) values (5, 'of pickled') values (6, 'peppers') ; quit; proc sql; create table t2 as select * from t1 where (t1.name contains 'Pet') OR (t1.name contains 'pic'); quit;
If I put the text that which I want to search in sourse in a separate dataset like
data inf;
length text $3; /*all values have the same length, this restriction is correct*/
input text;
datalines;
Pet
pic
;
run;
what will be the most appropriate way to find it in the t1 table?
(I want to get table equal t2 using inf.text)
I've found this way: (also using find(,,))
proc sql; create table t2 as select * from t1 where exists (select * from inf where index(t1.name, inf.text)<>0); quit;
But I can't say - does it work in the right way?
It seems that this code is rather slow in my queries and I cannot understand - is there something wrong with it?
What are the relative sizes of the actual tables involved. Are they all SAS datasets? Or are some in external database? If so are both tables in the same external database? Can you push the query into the database using explicit pass thru SQL?
It looks like you want to do an INNER JOIN which should be pretty fast.
Make sure the include DISTINCT keyword to avoid replicating observations with multiple "hits".
proc sql;
create table t2 as
select distinct t1.*
from t1
inner join INF
on t1.name contains inf.text
;
quit;
If values of INF.TEXT are not always exactly three characters long then you might need to worry about the trailing spaces that SAS will add.
I've found also one more choise:
proc sql; create table t2 as select * from t1; inner join inf on t1.name like '%'||inf.text||'%'; quit;
still can't choose which one is preferred..
@Ivan555 this code below is slower because of he enter selection process
proc sql;
create table t2 as
select * from t1
where exists (select * from inf where index(t1.name, inf.text)<>0);
quit;
the manner of using a data step or a SQL statement can be mainly based on preformance, user needs and controll, and sometimes using a data step is easier to explain to management whom may have no clue how code works but are required to signoff on the process.
What are the relative sizes of the actual tables involved. Are they all SAS datasets? Or are some in external database? If so are both tables in the same external database? Can you push the query into the database using explicit pass thru SQL?
It looks like you want to do an INNER JOIN which should be pretty fast.
Make sure the include DISTINCT keyword to avoid replicating observations with multiple "hits".
proc sql;
create table t2 as
select distinct t1.*
from t1
inner join INF
on t1.name contains inf.text
;
quit;
If values of INF.TEXT are not always exactly three characters long then you might need to worry about the trailing spaces that SAS will add.
@VDD thank you much, understood
> approximate size of t1-table is close to 10 mio rows, t2-table is a few dozens rows
> both tables are sas datasets
> about "distinct" - I agree, understood
> about "characters long" - I agree, understood
> about "the query into the database using explicit " I think I can't. Due to the fact that I am representing the side of the business, I can't see databases. Maybe in future I'll be able to ask our IT-division, but in moment the only people which I can ask for help with my code are here - that is the SAS Community 🙂
>your code seems works better than mine, thank you
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.