Hello!
I have 2 tables.
Suppose both of them have one column named ID, and that the first table has 2,000 values, and the second 20,000,000.
I want to take the values from the second table according to the conditions of the 1st table.
If I am starting writing constructions like these
>> inner join table t1 as t2.id = t1.id
>> where t2.id = t1.id
and so on, SAS calculations take a lot of time (about 15-30 minutes).
But if I write a list of 2000 values in the SAS EG client in a text of the code, like
>> where t2.id in (
1000
1001,
....
3000)
SAS calculations would finish in seconds.
Could you tell me how to use SAS SQL correctly? -manually copying data from one table to the text of the code does not seem right thing to do..
Sorry if this question is too trivial .. But I'm stuck on it ..
Depending on how long is an ID (if numeric then probably 8 bytes) and on amount of required IDs, I propose to upload the DBMS data in portions.
Adapt numbers to your data:
data p001;
set required_ids(obs=1000);
run;
data p002;
set required_ids(first_obs=1001, obs=1000);
run;
/* etc. up to max required IDs */
then use any available method (like sql intersect) to create
want1, want2, ... and finally concatenate them all into one:
data want;
set want1 want2 ...; /* or want: */
run;
I quote: "I want to take the values from the second table according to the conditions of the 1st table."
Does the first (small) table contains only the required IDs? or you need to select from them those who fit the conditions?
Suppose you selected the required IDs only by:
proc sql;
create table temp as select ID from have
where <conditions>;
quit;
then next step can look like:
proc sql;
create table want as select * from <table2>
where ID in (select ID from temp);
quit;
Thank you, I've done this thing with copying ID's in one table(and of course checked it), but query still works long time. (not even finished yet)
I wonder what is the difference between this subquery and the data written in the code in EG..
data want;
set have;
if _n_ = 1
then do;
declare hash t(dataset:"table (where=(condition))");
t.definekey("id");
t.definedone();
end;
if t.check() = 0;
run;
should be fastest (small table is "sorted" in memory, large table processed sequentially).
Question: are both tables stored as SAS Data Set, or is the big table coming from a DBMS?
@Ksharp Thank you, this procedure works fast, I think I have to somehow attach the other columns of the main table to it.
@Kurt_Bremser Thank you, I will think how to adapt your solution to my task, I will return with feedback.
@BrunoMueller I think that is my fault, sorry, I should have said this at the beginning. The small table is local, the large table is coming from the external Oracle DBMS - it is not being held in SAS, tables in Library are links.
@Ivan555 the big table being in the DBMS, explains the long runtime, as all the rows have to be passed from the DBMS to SAS to then do the join.
Do avoid the transfer of all the rows from the DBMS to SAS, you have several possibilities:
When the large table is in a remote DBMS, forget the data step/hash, as you do not want to transfer the whole table over the network. The proper method is to load your lookup data into a temporary table and do the join in the DBMS. Alternatively, try the macro variable approach, but this might fail if the number of lookup ID's grows, as you might run over the maximum size of a single query code in Oracle.
Thank you, understood.
I do not have access to the Oracle DBMS, I think that I should use macro variables, but it seems that 65,534 characters will not be enough.
Therefore, I will have to make an array of macro variables, the length of which will depend on the number of elements and characters in the table 'temp' from the example above.
I'm right?
proc sql;
create table temp as select ID from have
where <conditions>;
quit;
Depending on how long is an ID (if numeric then probably 8 bytes) and on amount of required IDs, I propose to upload the DBMS data in portions.
Adapt numbers to your data:
data p001;
set required_ids(obs=1000);
run;
data p002;
set required_ids(first_obs=1001, obs=1000);
run;
/* etc. up to max required IDs */
then use any available method (like sql intersect) to create
want1, want2, ... and finally concatenate them all into one:
data want;
set want1 want2 ...; /* or want: */
run;
Sorry for the absence, thanks for all, many answers are significant for me.
Big THX!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.