Solved: PROQ SQL - Left joining on a sas dataset to a table pulled from a tera...

Sas_Act_114 · Posted 02-19-2020 02:51 PM

Hello all,

I'll try to explain this as best as possible. I have a SAS Dataset with just one column, that has a list of about 3 to 4 million unique IDs. On the teradata server, there is a table whose data I want to pull for all the IDs in the SAS dataset. The issue I am encountering is that when I use PROC SQL, it does not recognize or seem to know about the SAS Dataset, thus I can't do a left (or right) join on the pull.

I also can not do a pull on just the table from the teradata server as it is extremely large and the only condition I have to limit the pull are these IDs. Here is an edited sample of the program I am running in what I am trying to do, with the SAS Dataset named SAS_TABLE and the teradata table named TERA_TABLE:

PROC SQL;
  CONNECT TO ODBC(DATAsrc='SERVER NAME');
  CREATE TABLE PULL1 AS
  SELECT *
  FROM CONNECTION TO ODBC
  (
SELECT  *

FROM SERVER.TERA_TABLE B

RIGHT JOIN SAS_TABLE A
  ON A.ID = B.ID
);

QUIT;

I've tried it this way and also a left join version, but each time, I get an error stating the SAS_TABLE does not exist:

ERROR: CLI prepare error: [Teradata][ODBC Teradata Driver][Teradata Database] Object 'SAS_TABLE' does not exist

I am assuming what the program is trying to do is find "SAS_TABLE" in the actual database, which is what is causing this. Any help on this would really be appreciated, thanks!

Patrick · Posted 02-19-2020 11:15 PM

Something along the line of below should work.

libname mytera odbc <connection info> insertbuff=100000 readbuff=10000 dbcommit=0;

proc append base=mytera.tmp_tbl data=sas_table;
run;

proc sql;
  connect using mytera;
  create table pull1 as
    select *
      from connection to mytera
        (
          select  *
          from 
            server.tera_table a
            inner join
            server.tmp_tbl b
            on a.id=b.id
        );
  disconnect from mytera;
quit;

proc datasets lib=mytera nolist nowarn;
  delete tmp_tbl;
  run;
quit;

If you just want to pick rows with matching ID's from the big Teradata table then an inner join should do (and also perform better).

If you also want all non-matching ID's from your SAS table in the final table on the SAS side and you expect quite a few ID's not to match then I'd add these ID's later on in a SAS data step. If you go for a left join then you will create rows on the Teradata side with all variables missing (except the ID) which you then have to load over the network into SAS.

View solution in original post

utrocketeng · Posted 02-19-2020 03:00 PM

i do not use TERA data, but it seems that you should do something like this:

PROC SQL;
  CONNECT TO ODBC(DATAsrc='SERVER NAME');
  /*Create table from remote datasource*/
  CREATE TABLE PULL1 AS
  SELECT *
  FROM CONNECTION TO ODBC
  (
  SELECT  * FROM SERVER.TERA_TABLE B
  )
;
/*join remote data extraction with table in sas*/
CREATE TABLE JOIN1 AS
  SELECT *
  FROM
  PULL1 A
  RIGHT JOIN SAS_TABLE B
  ON A.ID = B.ID
;

QUIT;

Sas_Act_114 · Posted 02-19-2020 03:04 PM

The issue with this, is I can't just do a pull on the table from the Teradata server. This table is massive, I'm talking several 100 millions of rows of data and the pull times out due to how long it takes. Otherwise yes, your approach is exactly what I would be attempting

Patrick · Posted 02-20-2020 12:26 AM

Another option is to give DBMASTER a try. Documented here.

libname mytera odbc <connection info> insertbuff=100000 readbuff=10000 dbcommit=0;

proc sql;
  create table pull1 as
    select a.*
    from 
      mytera.tera_table(dbmaster=yes) a
      inner join 
      sas_table b
      on a.id=b.id
    ;
quit;

Shmuel · Posted 02-19-2020 03:08 PM

Try next code

PROC SQL;
  CONNECT TO ODBC(DATAsrc='SERVER NAME')
    ??? 
  (
     CREATE TABLE PULL1 AS
       SELECT *
       FROM SERVER.TERA_TABLE B
       RIGHT JOIN SAS_TABLE A
    ON A.ID = B.ID
);
QUIT;

I never used ODBC nor Terradata, and I have the feeling that code miss something in the ??? line, assigning SERVER library

Patrick · Posted 02-19-2020 11:15 PM

Something along the line of below should work.

libname mytera odbc <connection info> insertbuff=100000 readbuff=10000 dbcommit=0;

proc append base=mytera.tmp_tbl data=sas_table;
run;

proc sql;
  connect using mytera;
  create table pull1 as
    select *
      from connection to mytera
        (
          select  *
          from 
            server.tera_table a
            inner join
            server.tmp_tbl b
            on a.id=b.id
        );
  disconnect from mytera;
quit;

proc datasets lib=mytera nolist nowarn;
  delete tmp_tbl;
  run;
quit;

If you just want to pick rows with matching ID's from the big Teradata table then an inner join should do (and also perform better).

If you also want all non-matching ID's from your SAS table in the final table on the SAS side and you expect quite a few ID's not to match then I'd add these ID's later on in a SAS data step. If you go for a left join then you will create rows on the Teradata side with all variables missing (except the ID) which you then have to load over the network into SAS.

ChrisNZ · Posted 02-20-2020 04:42 AM

It's unlikely that teradata can be wriitten to. Worth a try of course

High-Performance SAS Coding - Third Edition

Sas_Act_114 · Posted 02-20-2020 12:56 PM

This is what I was looking for, thank you very much!

ChrisNZ · Posted 02-19-2020 11:40 PM

Does this work (assuming a numeric join key here) ?

data _null_;
  set A end=LASTOBS;
  if _N_=1 then call execute("
    PROC SQL;
      CONNECT TO ODBC(DATAsrc='SERVER NAME');
      CREATE TABLE PULL1 AS
      SELECT *
      FROM CONNECTION TO ODBC
      ( SELECT  * FROM SERVER.TERA_TABLE WHERE ID IN(-999999 ");
  call execute(cats(',',ID));
  if LASTOBS then call execute('));quit;');
run;

Try with a few observations only, then when all is fine, run the whole list of values after redirecting the log using proc printto.

Some databases limit the number of values in a IN() list (Oracle's limit is 1000). in which case you need to insert OR operators between each reduced IN() clause.

Teradata doesn't seem to have this limit, but its query size may be be limited to 1 MB. In that case you must send successive queries using subsets of your table A.

High-Performance SAS Coding - Third Edition

PROQ SQL - Left joining on a sas dataset to a table pulled from a teradata server

Re: PROQ SQL - Left joining on a sas dataset to a table pulled from a teradata server

Re: PROQ SQL - Left joining on a sas dataset to a table pulled from a teradata server

Re: PROQ SQL - Left joining on a sas dataset to a table pulled from a teradata server

Re: PROQ SQL - Left joining on a sas dataset to a table pulled from a teradata server

Re: PROQ SQL - Left joining on a sas dataset to a table pulled from a teradata server

Re: PROQ SQL - Left joining on a sas dataset to a table pulled from a teradata server

Re: PROQ SQL - Left joining on a sas dataset to a table pulled from a teradata server

Re: PROQ SQL - Left joining on a sas dataset to a table pulled from a teradata server

Re: PROQ SQL - Left joining on a sas dataset to a table pulled from a teradata server

Register Today!

SAS Training: Just a Click Away