Hi All:
I need to get the record-id (var: seq_number) of the last record in a large Oracle table.
I use:
data work01 (keep=seq_number);
set oracle.table end=eof;
if eof then output;
else delete;
run;
but this runs for 20 minutes (there are > 14 million rows and the records are huge) and used 9 minutes of CPU alone.
Any ideas on how to do this faster? The Oracle table is not mine so I can't make any changes on that side.
Thanks in advance.
Hi,
Since you are using a data step i hope its is correct to assume that you are using a libname statment to connect to the oracle database.
In such a case what i believe happens is the entire table from the oracle is copied into sas and then the selected query is run.
If you were to use a Pass through query instead of the libname statement then you could use
select * from schema.table where rowid = (select max(rowid) from schema.table)
Please do correct me if i am wrong
Have you tried the firstobs=MAX dataset option. I can't test it, but it should work.
Hi,
Since you are using a data step i hope its is correct to assume that you are using a libname statment to connect to the oracle database.
In such a case what i believe happens is the entire table from the oracle is copied into sas and then the selected query is run.
If you were to use a Pass through query instead of the libname statement then you could use
select * from schema.table where rowid = (select max(rowid) from schema.table)
Please do correct me if i am wrong
Art297's method will not save your time,because it is also sequential access dataset method, you need directly access dataset method.But From your code, it looks like dataset is not in local SAS, so the best way is to consult with someone who is familiar with ORACLE's sql. use ORACLE's sql to get that.
Ksharp
Ksharp, I've never tried it with ORACLE, thus you may be right. However, I would think that the possible IO savings might still be significant.
Of course, as is usually the case with databases like ORACLE, usually the best time savings are derived from making friends with those who are maintaining those databases. They can probably easily set up a view that only contains the last record.
Art.T. Actually your way is the same with OP's way (i.e. must firstly load the whole oracle table into sas, then you can get the last obs).
In this case, the best way is to use Oracle's SQL , such as NN 's pass-through way(Oracel's SQL) ,let oracle get it.
Ksharp
Thanks to all for the help.
I checked with the Oracle DBA (she's cute!) and found out the field I was looking at was defined as an index variable for the Oracle table.
I used the SQL suggested by NN and the CPU time is down to nearly nothing!
I'm assuming that Oracle provides the index variable to the SQL rather than reading the actual data from the table.
Hi ... for future reference, if ever in the situation where you have a data set, you can read the last
observation directly ...
data test;
input seq @@;
datalines;
1 2 3 4 5 6 7 8 9 999
;
run;
data _null_;
set test nobs=obs point=obs;
put "last seq: " seq;
stop;
run;
last seq: 999
MikeZdeb:
Just wondering how efficient this would be with > 14 million records ?
OS2Rules wrote:
MikeZdeb:
Just wondering how efficient this would be with > 14 million records ?
It would be very efficient no matter how many records there because only ONE record is read. The POINTing method would need a bit of extra logic if the last record happened to have been REMOVED.
data_null:
I think the problem here is that I don't have the key for the last record or the relative record number. Basically, I need to find the key for the last record so I can use it the next time I read the file to pull any records that have been added - rather than reading the entire file.
_null_, maybe you miss something in this case.
OP's dataset is not in local sas, it is in remote Oracle database.
So SAS need to pull over the whole table into local sas ,then get the last obs.
Mike's code is only suited for the dataset has already in local sas.
So as I said using Pass-through(Oracle's SQL code) mentioned by NN is the best way to get that.
Ksharp
Ksharp:
Funny thing is - I really didn't need to do a pass-through SQL either. I have the database referenced in a Libname and just SQL'ed to the table that way. Still only takes a fraction of a second. I don't think a pass-through in this case would provide any additional benefits.
Hi,
Again i may not be a 100 % correct here Would request the experts to correct me if i am worng.
But i believe when you use the libname statement for connecting to oracle and your SQL where condition uses an Indexed or Key variable then sas doesnot transfer the entire table into sas but reads the query directly from Oracle itself.
However if you were to use any other varible which is not indexed then Passthrough should be faster.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.