BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
OS2Rules
Obsidian | Level 7

Hi All:

I need to get the record-id (var: seq_number) of the last record in a large Oracle table.

I use:

    data work01 (keep=seq_number);                                                                                                   

    set oracle.table end=eof;                                                                                                           

    if eof then output;                                                                                                                

    else delete;                                                                                                                       

    run;       

but this runs for 20 minutes (there are > 14 million rows and the records are huge) and used 9 minutes of CPU alone.

Any ideas on how to do this faster?  The Oracle table is not mine so I can't make any changes on that side.

Thanks in advance.               

1 ACCEPTED SOLUTION

Accepted Solutions
NN
Quartz | Level 8 NN
Quartz | Level 8

Hi,

Since you are using a data step i hope its is correct to assume that you are using a libname statment to connect to the oracle database.

In such a case what i believe happens is the entire table from the oracle is copied into sas and then the selected query is run.

If you were to use a Pass through query instead of the libname statement then you could use

select * from schema.table where rowid = (select max(rowid) from schema.table)

Please do correct me if i am wrong

View solution in original post

13 REPLIES 13
art297
Opal | Level 21

Have you tried the firstobs=MAX dataset option.  I can't test it, but it should work.

NN
Quartz | Level 8 NN
Quartz | Level 8

Hi,

Since you are using a data step i hope its is correct to assume that you are using a libname statment to connect to the oracle database.

In such a case what i believe happens is the entire table from the oracle is copied into sas and then the selected query is run.

If you were to use a Pass through query instead of the libname statement then you could use

select * from schema.table where rowid = (select max(rowid) from schema.table)

Please do correct me if i am wrong

Ksharp
Super User

Art297's method will not save your time,because it is also sequential access dataset method, you need directly access dataset method.But From your code, it looks like dataset is not in local SAS, so the best way is to consult with someone who is familiar with ORACLE's sql. use ORACLE's sql to get that.

Ksharp

art297
Opal | Level 21

Ksharp, I've never tried it with ORACLE, thus you may be right.  However, I would think that the possible IO savings might still be significant.

Of course, as is usually the case with databases like ORACLE, usually the best time savings are derived from making friends with those who are maintaining those databases.  They can probably easily set up a view that only contains the last record.

Ksharp
Super User

Art.T. Actually your way is the same with OP's way (i.e. must firstly load the whole oracle table into sas, then you can get the last obs).

In this case, the best way is to use Oracle's SQL , such as NN 's pass-through way(Oracel's SQL) ,let oracle get it.

Ksharp

OS2Rules
Obsidian | Level 7


Thanks to all for the help.

I checked with the Oracle DBA (she's cute!) and found out the field I was looking at was defined as an index variable for the Oracle table.

I used the SQL suggested by NN and the CPU time is down to nearly nothing!

I'm assuming that Oracle provides the index variable to the SQL rather than reading the actual data from the table.

MikeZdeb
Rhodochrosite | Level 12

Hi ... for future reference, if ever in the situation where you have a data set, you can read the last

observation directly ...

data test;

input seq @@;

datalines;

1 2 3 4 5 6 7 8 9 999

;

run;

data _null_;

set test nobs=obs point=obs;

put "last seq: " seq;

stop;

run;

last seq: 999

OS2Rules
Obsidian | Level 7

MikeZdeb:

Just wondering how efficient this would be with > 14 million records ?

data_null__
Jade | Level 19

OS2Rules wrote:

MikeZdeb:

Just wondering how efficient this would be with > 14 million records ?


It would be very efficient no matter how many records there because only ONE record is read.  The POINTing method would need a bit of extra logic if the last record happened to have been REMOVED.

OS2Rules
Obsidian | Level 7

data_null:

I think the problem here is that I don't have the key for the last record or the relative record number.  Basically, I need to find the key for the last record so I can use it the next time I read the file to pull any records that have been added - rather than reading the entire file.

Ksharp
Super User

_null_, maybe you miss something in this case.

OP's dataset is not in local sas, it is in remote Oracle database.

So SAS need to pull over the whole table into local sas ,then get the last obs.

Mike's code is only suited for the dataset has already in local sas.

So as I said using Pass-through(Oracle's SQL code) mentioned by NN is the best way to get that.

Ksharp

OS2Rules
Obsidian | Level 7

Ksharp:

Funny thing is - I really didn't need to do a pass-through SQL either.  I have the database referenced in a Libname and just SQL'ed to the table that way.  Still only takes a fraction of a second.  I don't think a pass-through in this case would provide any additional benefits.

NN
Quartz | Level 8 NN
Quartz | Level 8

Hi,

Again i may not be a 100 % correct here Would request the experts to correct me if i am worng.

But i believe when you use the libname statement for connecting to oracle and your SQL where condition uses an Indexed or Key variable then sas doesnot transfer the entire table into sas but reads the query directly from Oracle itself.

However if you were to use any other varible which is not indexed then Passthrough should be faster.

sas-innovate-2024.png

Today is the last day to save with the early bird rate! Register today for just $695 - $100 off the standard rate.

 

Plus, pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 6975 views
  • 1 like
  • 6 in conversation