BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Ivan555
Quartz | Level 8

Hello!

 

I have 2 tables.
Suppose both of them have one column named ID, and that the first table has 2,000 values, and the second 20,000,000.

I want to take the values ​​from the second table according to the conditions of the 1st table.

 

If I am starting writing constructions like these
>> inner join table t1 as t2.id = t1.id
>> where t2.id = t1.id
and so on, SAS calculations take a lot of time (about 15-30 minutes).

 

But if I write a list of 2000 values ​​in the SAS EG client in a text of the code, like
>> where t2.id in (
1000
1001,
....
3000)
SAS calculations would finish in seconds.

 

Could you tell me how to use SAS SQL correctly?  -manually copying data from one table to the text of the code does not seem right thing to do..

 

Sorry if this question is too trivial .. But I'm stuck on it ..

1 ACCEPTED SOLUTION

Accepted Solutions
Shmuel
Garnet | Level 18

Depending on how long is an ID (if numeric then probably 8 bytes) and on amount of required IDs, I propose to upload the DBMS data in portions.

Adapt numbers to your data:

data p001;
 set required_ids(obs=1000);
run;
data p002;
 set required_ids(first_obs=1001, obs=1000);
run;
/* etc. up to max required IDs */

then use any available method (like sql intersect) to create

want1, want2, ... and finally concatenate them all into one:

data want;
 set want1 want2 ...; /* or want: */
run;

 

View solution in original post

12 REPLIES 12
Shmuel
Garnet | Level 18

I quote: "I want to take the values ​​from the second table according to the conditions of the 1st table."

 

Does the first (small) table contains only the required IDs? or you need to select from them those who fit the conditions?

Suppose you selected the required IDs only by:

proc sql;
  create table temp as select ID from have
  where <conditions>;
quit;

then next step can look like:

proc sql;
  create table want as select * from <table2>
  where ID in (select ID from temp);
quit;

 

Ivan555
Quartz | Level 8

@Shmuel 

Thank you, I've done this thing with copying ID's in one table(and of course checked it), but query still works long time. (not even finished yet)

I wonder what is the difference between this subquery and the data written in the code in EG..

Ksharp
Super User
If two tables only have one column, INTERSECT operator would be the fastest.

proc sql;
create table want as
select id from table1
intersect
select id from table2;
quit;
Kurt_Bremser
Super User
data want;
set have;
if _n_ = 1
then do;
  declare hash t(dataset:"table (where=(condition))");
  t.definekey("id");
  t.definedone();
end;
if t.check() = 0;
run;

should be fastest (small table is "sorted" in memory, large table processed sequentially).

BrunoMueller
SAS Super FREQ

Question: are both tables stored as SAS Data Set, or is the big table coming from a DBMS?

Ivan555
Quartz | Level 8

@Ksharp Thank you, this procedure works fast, I think I have to somehow attach the other columns of the main table to it.

 

@Kurt_Bremser Thank you, I will think how to adapt your solution to my task, I will return with feedback.

 

@BrunoMueller I think that is my fault, sorry, I should have said this at the beginning. The small table is local, the large table is coming from the external Oracle DBMS - it is not being held in SAS, tables in Library are links.

BrunoMueller
SAS Super FREQ

@Ivan555 the big table being in the DBMS, explains the long runtime, as all the rows have to be passed from the DBMS to SAS to then do the join.

 

Do avoid the transfer of all the rows from the DBMS to SAS, you have several possibilities:

  • copy the small table to the DBMS, so the join can be made within the DBMS, might no be an option depending on your rights
  • create the list of IDs using the Proc SQL select id into :idList separated by "," and use the macro var in the IN operator, similar to how you typed it in, but this time all automatic
  • If you have control over the LIBNAME statement pointing to the DBMS, have a look at the MULTI_DATASRC_OPT, it will do something similar, just all automatic
Kurt_Bremser
Super User

When the large table is in a remote DBMS, forget the data step/hash, as you do not want to transfer the whole table over the network. The proper method is to load your lookup data into a temporary table and do the join in the DBMS. Alternatively, try the macro variable approach, but this might fail if the number of lookup ID's grows, as you might run over the maximum size of a single query code in Oracle.

Ivan555
Quartz | Level 8

@Kurt_Bremser @BrunoMueller 

 

Thank you, understood.

I do not have access to the Oracle DBMS, I think that I should use macro variables, but it seems that 65,534 characters will not be enough.

Therefore, I will have to make an array of macro variables, the length of which will depend on the number of elements and characters in the table 'temp' from the example above.

I'm right?

 

proc sql;
  create table temp as select ID from have
  where <conditions>;
quit;
 
Shmuel
Garnet | Level 18

Depending on how long is an ID (if numeric then probably 8 bytes) and on amount of required IDs, I propose to upload the DBMS data in portions.

Adapt numbers to your data:

data p001;
 set required_ids(obs=1000);
run;
data p002;
 set required_ids(first_obs=1001, obs=1000);
run;
/* etc. up to max required IDs */

then use any available method (like sql intersect) to create

want1, want2, ... and finally concatenate them all into one:

data want;
 set want1 want2 ...; /* or want: */
run;

 

Ksharp
Super User
Once you got these IDs , using Shmule's code to bring other columns in.

proc sql;
create table want as select * from <table2>
where ID in (select ID from temp);
quit;
Ivan555
Quartz | Level 8

Sorry for the absence, thanks for all, many answers are significant for me.

Big THX!

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 1310 views
  • 8 likes
  • 5 in conversation