Solved: Re: extract last obs from Oracle dataset

acordes · Posted 01-19-2023 06:11 AM

Hi @Kurt_Bremser, Do you know how to circumvent this error?

ERROR: The POINT= data set option is not valid for the data set MYORA.AUD_BIWESTIM, the data set must be randomly not sequentially 
       accessible for POINT= processing.

Its your solution applied to an oracle load.

https://communities.sas.com/t5/forums/replypage/board-id/programming/message-id/217421


LIBNAME myora ORACLE PATH=XXXXX.vwfsag.de USER=SAS PASSWORD='XXXXXXXX';
data MKT.aki(promote=no);
set myora.AUD_BIWESTIM(obs=100000);
run;

data MKT.aki(promote=no);
do pt = nobs - 10000 to nobs;
  set myora.AUD_BIWESTIM nobs=nobs point=pt;
  output;
end;
stop;
run;

Patrick · Posted 01-20-2023 07:23 PM

Where does the extra semi-colon come from. It's not in the code as posted.

And because I "stumbled" over the code while doing something else here the logic to derive a DB schema name from a SAS library definition (code for SAS libref myora as per your sample code).

proc sql noprint;
  select sysvalue into :ora_schema trimmed
  from dictionary.libnames
  where
    libname="%upcase(myora)"
    and sysname='Schema/Owner'
  ;
quit;

proc sql;
  connect using myora;
  create table want as
  select * from connection to myora
  (
    SELECT * 
    FROM &ora_schema..fact_ope_form  
    limit 10000;
  )
  ;
  disconnect from myora;
quit;

IF you've got already a SAS library defined that points to your Oracle schema then I'd be using the connect using syntax as it avoids passing explicit connection info in code - especially credentials. If there isn't such a pre-assigned SAS library then ideally use an authentication domain instead of explicit credentials.

If developing explicit pass-through SQL I normally use a two step approach:
1. First develop the DB specific pass-through SQL syntax using a DB client (i.e. SQL Developer).

2. Once DB SQL fully working "wrap" SAS around it.

View solution in original post

russt_sas · Posted 01-19-2023 10:07 AM

This error normally indicates that the input data set has one or more of these options specified when it was created, as either a data set option or a system option:

  COMPRESS=YES, POINTOBS=NO or REUSE=YES.

You cannot use the POINT= on a compressed data set.

Tom · Posted 01-20-2023 11:34 AM

@russt_sas wrote:

This error normally indicates that the input data set has one or more of these options specified when it was created, as either a data set option or a system option:

  COMPRESS=YES, POINTOBS=NO or REUSE=YES.

You cannot use the POINT= on a compressed data set.

Since the question seems to be about access ORACLE tables this comment does not really apply.

But I wanted to make sure to let you know that POINT= works fine with compressed SAS datasets (at least for the last 20 years or so).

1992  data class(compress=yes);
1993    set sashelp.class;
1994  run;

NOTE: There were 19 observations read from the data set SASHELP.CLASS.
NOTE: The data set WORK.CLASS has 19 observations and 5 variables.
NOTE: Compressing data set WORK.CLASS increased size by 100.00 percent.
      Compressed is 2 pages; un-compressed would require 1 pages.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.01 seconds


1995
1996  data want;
1997    do point=1,5,3,7 ;
1998      set class point=point;
1999      put point= name= ;
2000      output;
2001    end;
2002    stop;
2003  run;

point=1 Name=Alfred
point=5 Name=Henry
point=3 Name=Barbara
point=7 Name=Jane
NOTE: The data set WORK.WANT has 4 observations and 5 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds

Kurt_Bremser · Posted 01-19-2023 10:08 AM

See if you can do the selection in Oracle with Oracle means.

Otherwise, since you can't directly address a single observation (POINT=), you will have to pull the whole table to SAS and do the subset there:

data want;
set oracle.have nobs=nobs;
if _n_ > nobs - 10000;
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

acordes · Posted 01-19-2023 02:40 PM

Thanks @Kurt_Bremser , but no success.

84   data MKT.miki(promote=no) ;
85   set myora.FACT_OPE_FORM nobs=nobs;
86   if _n_ > nobs - 10000;
87   _fecformo=datepart(fecformo);
88   run;
NOTE: There were 1630486 observations read from the data set MYORA.FACT_OPE_FORM.
NOTE: The data set MKT.MIKI has 0 observations and 191 variables.
NOTE: DATA statement used (Total process time):
      real time           40.79 seconds
      cpu time            39.81 seconds

SASKiwi · Posted 01-19-2023 02:56 PM

@acordes - Please be aware that reading rows from an Oracle table won't necessarily be in the order they were added. You have to define what order you want your result set in. See my other post.

Kurt_Bremser · Posted 01-20-2023 03:04 AM

It seems that while the NOBS= option is available, it returns an incorrect (too large) number. Retrieve the correct number from the Oracle equivalent of DICTIONARY.TABLES (use explicit pass-through).

You may also have an issue where the table size is reduced (records deleted) on the Oracle side while the data step runs.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

SASKiwi · Posted 01-19-2023 02:11 PM

What's your definition of "last obs from Oracle dataset"? To be able to read the most recently added rows from the Oracle table you will need row update date as a column in the table. If there isn't one you will be stuck. Another possible solution might be available if the table contains a sequential ID or row count.

acordes · Posted 01-19-2023 02:48 PM

What best works for the moment relies on certain knowledge of the table putting me in the shoes to apply an order criteria.

But that's not always the case. I use the loading for data exploration and pre-processing.

Some tables have 20 million rows and I don't know their context.

Reading all these rows to viya takes a long time.

So I suppose they are ordered already in some regard, like first-in/last-out meaning that the last observations are the most recent ones.

It's about portfolio of contracts and I'm much more interested in analyzing the last 5% of the whole table or let it be 100.000 rows.

1    %studio_hide_wrapper;
82   data MKT.miki(promote=no) ;
83   set myora.FACT_OPE_FORM (obs=10000);
84   by descending fecformo;
85   _fecformo=datepart(fecformo);
86   run;
NOTE: There were 10000 observations read from the data set MYORA.FACT_OPE_FORM.
NOTE: The data set MKT.MIKI has 10000 observations and 191 variables.
NOTE: DATA statement used (Total process time):
      real time           13.73 seconds
      cpu time            0.39 seconds
      
87   
88   proc summary data=mkt.miki print n;
89   class _fecformo;
90   format _fecformo year.;
91   var fk_obj;
92   run;
NOTE: The CAS aggregation.aggregate action will be used to perform the initial summarization.
NOTE: The PROCEDURE SUMMARY printed page 2.
NOTE: PROCEDURE SUMMARY used (Total process time):
      real time           0.06 seconds
      cpu time            0.03 seconds
      
93   
94   %studio_hide_wrapper;
105  
106

SASKiwi · Posted 01-19-2023 03:10 PM

You could in that case use a WHERE clause on FECFORMO to get the last few weeks or months of data.

Sajid01 · Posted 01-19-2023 11:09 PM

Hello @acordes
The message is clear that oracle table is not available for random access.
I suggest that you try what @Kurt_Bremser has suggested with a minor change.
This is because it appears that your intention is to read observations after the 10000th observation.

data want;
set oracle.have nobs=nobs;
if _n_ >  10000;
run;

acordes · Posted 01-20-2023 04:02 AM

@Sajid01 @Kurt_Bremser I share with you the time it takes doing different approaches.

The by descending performs much faster.

However the pre-requisite is that you know a sorting criteria that makes sense.

In some case this is not given, therefore for me a valid choice is to sort by descending _all_ assuming that the oracle table has more recent observations at the end.

The summary screenshot has more weight on recent years of contract signing date and somehow (in this case) mimics a random selection. That's not a very scientific approach, but for my case at hand it's acceptable as I'm using all the oracle tables for cas actions in the context of sample data, cardinality, robust univariate statistics, outliers, distinct count, ...

I've have tried as well using fedsql for accessing the oracle table, but it takes longer.

Thanks to all for your support.

Patrick · Posted 01-20-2023 07:31 AM

As you already found out Oracle tables are not sequential. Data is stored arbitrary and there is no concept of first row or last row in a table. If you don't request a result set to be returned in a specific order (order by ...) then the row order returned can vary between executions for the exactly same code.

The bottleneck is the data transfer between Oracle and SAS. If you've got an Oracle table with 10 million rows but only need 10000 on the SAS side then you should reduce the data volumes on the Oracle side and only transfer what you need to the SAS side.

If the only thing you care about is the number of rows you get on the SAS side then using explicit pass-through SQL with the LIMIT clause will likely outperform any other approach.

proc sql;
  connect using myora;
  create table want as
  select * from connection to myora
  (
    SELECT * 
    FROM <schema name>.fact_ope_form  
    limit 10000;
  )
  ;
  disconnect from myora;
quit;

If you don't know the Oracle schema name then one way to get it easily is to execute below which will print the library definition with the schema name to the SAS log.

libname myora list;

You could of course also query the SAS dictionary tables to retrieve the schema name dynamically.

....and in case this needs to execute in CAS then look-up how explicit pass-through SQL works for CAS. You could certainly not use Proc SQL. Proc FedSQL should work or there are eventually also some CAS actions sets for this.

What makes me think you're using SAS Viya CAS is your usage of a by statement without a Proc Sort.

If that's what you're doing then add the following options right before your data step.

OPTIONS SASTRACE=',,,d' SASTRACELOC=SASLOG NOSTSUFFIX;

These options will print into the SAS log the SQL that SAS generates and sends to Oracle for execution. If you're lucky then CAS uses the data set OBS= option and translates it into a Oracle SQL limit clause. If that's the case then just use the data step you've got without a by statement.

acordes · Posted 01-20-2023 09:15 AM

Yes, I'm using CAS.

I a different project I use the proc sql to insert new rows into an existing oracle table. So I made it work.

OPTIONS SASTRACE=',,,d' SASTRACELOC=SASLOG NOSTSUFFIX;
proc sql;
   connect to oracle as dbcon
       (user=SAS password='xxxxxxx' buffsize=100
        PATH=xxxxx.vwfsag.de);
create table PUBLIC.FACT_OPE_FORM as select *
   from connection to dbcon
     (select *
      from SAS.FACT_OPE_FORM
    limit 10000;
  )
  ;
  disconnect from myora;
quit;

Running your code gives an error:

ORACLE_2: Prepared: on connection 1
select FECFORMO from SAS.FACT_OPE_FORM; limit 10000;
 
ERROR: ORACLE prepare error: ORA-00933: SQL command not properly ended. SQL statement: select FECFORMO from SAS.FACT_OPE_FORM; 
       limit 10000;.
NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements.
94     disconnect from myora;
NOTE: Statement not executed due to NOEXEC option.
95   quit;

Tom · Posted 01-20-2023 11:30 AM

@acordes wrote:

Yes, I'm using CAS.

I a different project I use the proc sql to insert new rows into an existing oracle table. So I made it work.

OPTIONS SASTRACE=',,,d' SASTRACELOC=SASLOG NOSTSUFFIX;
proc sql;
   connect to oracle as dbcon
       (user=SAS password='xxxxxxx' buffsize=100
        PATH=xxxxx.vwfsag.de);
create table PUBLIC.FACT_OPE_FORM as select *
   from connection to dbcon
     (select *
      from SAS.FACT_OPE_FORM
    limit 10000;
  )
  ;
  disconnect from myora;
quit;

Running your code gives an error:

ORACLE_2: Prepared: on connection 1
select FECFORMO from SAS.FACT_OPE_FORM; limit 10000;
 
ERROR: ORACLE prepare error: ORA-00933: SQL command not properly ended. SQL statement: select FECFORMO from SAS.FACT_OPE_FORM; 
       limit 10000;.
NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements.
94     disconnect from myora;
NOTE: Statement not executed due to NOEXEC option.
95   quit;

Did you add the extra semi-colon? Or did VIYA insert it?

Shouldn't you use PROC FEDSQL with VIYA instead of PROC SQL?

Registration is open

SAS Training: Just a Click Away