Hello all,
I'll try to explain this as best as possible. I have a SAS Dataset with just one column, that has a list of about 3 to 4 million unique IDs. On the teradata server, there is a table whose data I want to pull for all the IDs in the SAS dataset. The issue I am encountering is that when I use PROC SQL, it does not recognize or seem to know about the SAS Dataset, thus I can't do a left (or right) join on the pull.
I also can not do a pull on just the table from the teradata server as it is extremely large and the only condition I have to limit the pull are these IDs. Here is an edited sample of the program I am running in what I am trying to do, with the SAS Dataset named SAS_TABLE and the teradata table named TERA_TABLE:
PROC SQL;
CONNECT TO ODBC(DATAsrc='SERVER NAME');
CREATE TABLE PULL1 AS
SELECT *
FROM CONNECTION TO ODBC
(
SELECT *
FROM SERVER.TERA_TABLE B
RIGHT JOIN SAS_TABLE A
ON A.ID = B.ID
);
QUIT;
I've tried it this way and also a left join version, but each time, I get an error stating the SAS_TABLE does not exist:
ERROR: CLI prepare error: [Teradata][ODBC Teradata Driver][Teradata Database] Object 'SAS_TABLE' does not exist
I am assuming what the program is trying to do is find "SAS_TABLE" in the actual database, which is what is causing this. Any help on this would really be appreciated, thanks!
Something along the line of below should work.
libname mytera odbc <connection info> insertbuff=100000 readbuff=10000 dbcommit=0;
proc append base=mytera.tmp_tbl data=sas_table;
run;
proc sql;
connect using mytera;
create table pull1 as
select *
from connection to mytera
(
select *
from
server.tera_table a
inner join
server.tmp_tbl b
on a.id=b.id
);
disconnect from mytera;
quit;
proc datasets lib=mytera nolist nowarn;
delete tmp_tbl;
run;
quit;
If you just want to pick rows with matching ID's from the big Teradata table then an inner join should do (and also perform better).
If you also want all non-matching ID's from your SAS table in the final table on the SAS side and you expect quite a few ID's not to match then I'd add these ID's later on in a SAS data step. If you go for a left join then you will create rows on the Teradata side with all variables missing (except the ID) which you then have to load over the network into SAS.
i do not use TERA data, but it seems that you should do something like this:
PROC SQL;
CONNECT TO ODBC(DATAsrc='SERVER NAME');
/*Create table from remote datasource*/
CREATE TABLE PULL1 AS
SELECT *
FROM CONNECTION TO ODBC
(
SELECT * FROM SERVER.TERA_TABLE B
)
;
/*join remote data extraction with table in sas*/
CREATE TABLE JOIN1 AS
SELECT *
FROM
PULL1 A
RIGHT JOIN SAS_TABLE B
ON A.ID = B.ID
;
QUIT;
The issue with this, is I can't just do a pull on the table from the Teradata server. This table is massive, I'm talking several 100 millions of rows of data and the pull times out due to how long it takes. Otherwise yes, your approach is exactly what I would be attempting
Another option is to give DBMASTER a try. Documented here.
libname mytera odbc <connection info> insertbuff=100000 readbuff=10000 dbcommit=0;
proc sql;
create table pull1 as
select a.*
from
mytera.tera_table(dbmaster=yes) a
inner join
sas_table b
on a.id=b.id
;
quit;
Try next code
PROC SQL;
CONNECT TO ODBC(DATAsrc='SERVER NAME')
???
(
CREATE TABLE PULL1 AS
SELECT *
FROM SERVER.TERA_TABLE B
RIGHT JOIN SAS_TABLE A
ON A.ID = B.ID
);
QUIT;
I never used ODBC nor Terradata, and I have the feeling that code miss something in the ??? line, assigning SERVER library
Something along the line of below should work.
libname mytera odbc <connection info> insertbuff=100000 readbuff=10000 dbcommit=0;
proc append base=mytera.tmp_tbl data=sas_table;
run;
proc sql;
connect using mytera;
create table pull1 as
select *
from connection to mytera
(
select *
from
server.tera_table a
inner join
server.tmp_tbl b
on a.id=b.id
);
disconnect from mytera;
quit;
proc datasets lib=mytera nolist nowarn;
delete tmp_tbl;
run;
quit;
If you just want to pick rows with matching ID's from the big Teradata table then an inner join should do (and also perform better).
If you also want all non-matching ID's from your SAS table in the final table on the SAS side and you expect quite a few ID's not to match then I'd add these ID's later on in a SAS data step. If you go for a left join then you will create rows on the Teradata side with all variables missing (except the ID) which you then have to load over the network into SAS.
It's unlikely that teradata can be wriitten to. Worth a try of course
This is what I was looking for, thank you very much!
Does this work (assuming a numeric join key here) ?
data _null_;
set A end=LASTOBS;
if _N_=1 then call execute("
PROC SQL;
CONNECT TO ODBC(DATAsrc='SERVER NAME');
CREATE TABLE PULL1 AS
SELECT *
FROM CONNECTION TO ODBC
( SELECT * FROM SERVER.TERA_TABLE WHERE ID IN(-999999 ");
call execute(cats(',',ID));
if LASTOBS then call execute('));quit;');
run;
Try with a few observations only, then when all is fine, run the whole list of values after redirecting the log using proc printto.
Some databases limit the number of values in a IN() list (Oracle's limit is 1000). in which case you need to insert OR operators between each reduced IN() clause.
Teradata doesn't seem to have this limit, but its query size may be be limited to 1 MB. In that case you must send successive queries using subsets of your table A.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.