BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Ivan555
Quartz | Level 8

Hello!

 

I do have a code:

 

proc sql;
create table t1 (id int, name char(30));
insert into t1 
values (1, 'Peter')
values (2, 'Piper')
values (3, 'picked')
values (4, 'a peck)
values (5, 'of pickled')
values (6, 'peppers')
;
quit; 

proc sql;
create table t2 as select * from t1
where (t1.name contains 'Pet') OR (t1.name contains 'pic');
quit;

If I put the text that which I want to search in sourse in a separate dataset like

data inf;
length text $3; /*all values have the same length, this restriction is correct*/
input text;
datalines;
Pet
pic
;
run;

what will be the most appropriate way to find it in the t1 table?

(I want to get table equal t2 using inf.text)

 

I've found this way: (also using find(,,))

proc sql;
create table t2 as
select * from t1
where exists (select * from inf where index(t1.name, inf.text)<>0);
quit;

But I can't say - does it work in the right way?

 

It seems that this code is rather slow in my queries and I cannot understand - is there something wrong with it?

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

What are the relative sizes of the actual tables involved.  Are they all SAS datasets?  Or are some in external database?  If so are both tables in the same external database?  Can you push the query into the database using explicit pass thru SQL?

 

It looks like you want to do an INNER JOIN which should be pretty fast. 

Make sure the include DISTINCT keyword to avoid replicating observations with multiple "hits".

proc sql;
create table t2 as 
  select distinct t1.* 
  from t1
  inner join INF 
    on t1.name contains inf.text
;
quit;

If values of INF.TEXT are not always exactly three characters long then you might need to worry about the trailing spaces that SAS will add.

View solution in original post

4 REPLIES 4
Ivan555
Quartz | Level 8

I've found also one more choise:

proc sql;
create table t2 as
select * from t1;
inner join inf on t1.name like '%'||inf.text||'%';
quit;

still can't choose which one is preferred..

VDD
Ammonite | Level 13 VDD
Ammonite | Level 13

@Ivan555 this code below is slower because of he enter selection process

proc sql;
create table t2 as
select * from t1
where exists (select * from inf where index(t1.name, inf.text)<>0);
quit;

the manner of using a data step or a SQL statement can be mainly based on preformance, user needs and controll, and sometimes using a data step is easier to explain to management whom may have no clue how code works but are required to signoff on the process.

 

 

 

Tom
Super User Tom
Super User

What are the relative sizes of the actual tables involved.  Are they all SAS datasets?  Or are some in external database?  If so are both tables in the same external database?  Can you push the query into the database using explicit pass thru SQL?

 

It looks like you want to do an INNER JOIN which should be pretty fast. 

Make sure the include DISTINCT keyword to avoid replicating observations with multiple "hits".

proc sql;
create table t2 as 
  select distinct t1.* 
  from t1
  inner join INF 
    on t1.name contains inf.text
;
quit;

If values of INF.TEXT are not always exactly three characters long then you might need to worry about the trailing spaces that SAS will add.

Ivan555
Quartz | Level 8

@VDD thank you much, understood

 

@Tom 

> approximate size of t1-table is close to 10 mio rows, t2-table is a few dozens rows

> both tables are sas datasets

> about "distinct" - I agree, understood

> about  "characters long" - I agree, understood

> about "the query into the database using explicit " I think I can't. Due to the fact that I am representing the side of the business, I can't see databases. Maybe in future I'll be able to ask our IT-division, but in moment the only people which I can ask for help with my code are here - that is the SAS Community 🙂

>your code seems works better than mine, thank you

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1022 views
  • 3 likes
  • 3 in conversation