BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Sas_Act_114
Fluorite | Level 6

Hello all,

 

I'll try to explain this as best as possible. I have a SAS Dataset with just one column, that has a list of about 3 to 4 million unique IDs. On the teradata server, there is a table whose data I want to pull for all the IDs in the SAS dataset. The issue I am encountering is that when I use PROC SQL, it does not recognize or seem to know about the SAS Dataset, thus I can't do a left (or right) join on the pull.

 

I also can not do a pull on just the table from the teradata server as it is extremely large and the only condition I have to limit the pull are these IDs. Here is an edited sample of the program I am running in what I am trying to do, with the SAS Dataset named SAS_TABLE and the teradata table named TERA_TABLE:

 

PROC SQL;
  CONNECT TO ODBC(DATAsrc='SERVER NAME');
  CREATE TABLE PULL1 AS
  SELECT *
  FROM CONNECTION TO ODBC
  (
SELECT  *

FROM SERVER.TERA_TABLE B

RIGHT JOIN SAS_TABLE A
  ON A.ID = B.ID
);

QUIT;

I've tried it this way and also a left join version, but each time, I get an error stating the SAS_TABLE does not exist:

 

ERROR: CLI prepare error: [Teradata][ODBC Teradata Driver][Teradata Database] Object 'SAS_TABLE' does not exist

 

I am assuming what the program is trying to do is find "SAS_TABLE" in the actual database, which is what is causing this. Any help on this would really be appreciated, thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

Something along the line of below should work.

libname mytera odbc <connection info> insertbuff=100000 readbuff=10000 dbcommit=0;

proc append base=mytera.tmp_tbl data=sas_table;
run;

proc sql;
  connect using mytera;
  create table pull1 as
    select *
      from connection to mytera
        (
          select  *
          from 
            server.tera_table a
            inner join
            server.tmp_tbl b
            on a.id=b.id
        );
  disconnect from mytera;
quit;

proc datasets lib=mytera nolist nowarn;
  delete tmp_tbl;
  run;
quit;

If you just want to pick rows with matching ID's from the big Teradata table then an inner join should do (and also perform better).

 

If you also want all non-matching ID's from your SAS table in the final table on the SAS side and you expect quite a few ID's not to match then I'd add these ID's later on in a SAS data step. If you go for a left join then you will create rows on the Teradata side with all variables missing (except the ID) which you then have to load over the network into SAS.

View solution in original post

8 REPLIES 8
utrocketeng
Quartz | Level 8

i do not use TERA data, but it seems that you should do something like this:

PROC SQL;
  CONNECT TO ODBC(DATAsrc='SERVER NAME');
  /*Create table from remote datasource*/
  CREATE TABLE PULL1 AS
  SELECT *
  FROM CONNECTION TO ODBC
  (
  SELECT  * FROM SERVER.TERA_TABLE B
  )
;
/*join remote data extraction with table in sas*/
CREATE TABLE JOIN1 AS
  SELECT *
  FROM
  PULL1 A
  RIGHT JOIN SAS_TABLE B
  ON A.ID = B.ID
;

QUIT;
Sas_Act_114
Fluorite | Level 6

The issue with this, is I can't just do a pull on the table from the Teradata server. This table is massive, I'm talking several 100 millions of rows of data and the pull times out due to how long it takes. Otherwise yes, your approach is exactly what I would be attempting Smiley Happy

Patrick
Opal | Level 21

Another option is to give DBMASTER a try. Documented here.

 

libname mytera odbc <connection info> insertbuff=100000 readbuff=10000 dbcommit=0;

proc sql;
  create table pull1 as
    select a.*
    from 
      mytera.tera_table(dbmaster=yes) a
      inner join 
      sas_table b
      on a.id=b.id
    ;
quit;
Shmuel
Garnet | Level 18

Try next code

PROC SQL;
  CONNECT TO ODBC(DATAsrc='SERVER NAME')
??? ( CREATE TABLE PULL1 AS SELECT * FROM SERVER.TERA_TABLE B RIGHT JOIN SAS_TABLE A ON A.ID = B.ID ); QUIT;

I never used ODBC nor Terradata, and I have the feeling that code miss something in the ??? line, assigning SERVER library

Patrick
Opal | Level 21

Something along the line of below should work.

libname mytera odbc <connection info> insertbuff=100000 readbuff=10000 dbcommit=0;

proc append base=mytera.tmp_tbl data=sas_table;
run;

proc sql;
  connect using mytera;
  create table pull1 as
    select *
      from connection to mytera
        (
          select  *
          from 
            server.tera_table a
            inner join
            server.tmp_tbl b
            on a.id=b.id
        );
  disconnect from mytera;
quit;

proc datasets lib=mytera nolist nowarn;
  delete tmp_tbl;
  run;
quit;

If you just want to pick rows with matching ID's from the big Teradata table then an inner join should do (and also perform better).

 

If you also want all non-matching ID's from your SAS table in the final table on the SAS side and you expect quite a few ID's not to match then I'd add these ID's later on in a SAS data step. If you go for a left join then you will create rows on the Teradata side with all variables missing (except the ID) which you then have to load over the network into SAS.

ChrisNZ
Tourmaline | Level 20

It's unlikely that teradata can be wriitten to. Worth a try of course

Sas_Act_114
Fluorite | Level 6

This is what I was looking for, thank you very much!

ChrisNZ
Tourmaline | Level 20

Does this work (assuming a numeric join key here) ?

data _null_;
  set A end=LASTOBS;
  if _N_=1 then call execute("
    PROC SQL;
      CONNECT TO ODBC(DATAsrc='SERVER NAME');
      CREATE TABLE PULL1 AS
      SELECT *
      FROM CONNECTION TO ODBC
      ( SELECT  * FROM SERVER.TERA_TABLE WHERE ID IN(-999999 ");
  call execute(cats(',',ID));
  if LASTOBS then call execute('));quit;');
run;

 

Try with a few observations only, then when all is fine, run the whole list of values after redirecting the log using proc printto.

 

Some databases limit the number of values in a IN() list (Oracle's limit is 1000). in which case you need to insert OR operators between each reduced IN() clause.

Teradata doesn't seem to have this limit, but its query size may be be limited to 1 MB. In that case you must send successive queries using subsets of your table A.

 

 

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1563 views
  • 1 like
  • 5 in conversation