Hi Guys,
I am trying to connect to a Cloudera cluster and run a query there by using an execute statement:
libname sas2 hadoop
uri='.........'
schema="data_analyst"
SERVER="1***.**.**.***"
TRANSCODE_FAIL=SILENT
DBMAX_TEXT=50;
libname vantage hadoop
uri='.........'
schema="vantage"
SERVER="***.**.**.***"
TRANSCODE_FAIL=SILENT
DBMAX_TEXT=50;
proc sql;
connect to HADOOP
(
SERVER="***.**.**.***"
uri='......'
);
execute(
create table sas.T0 as
SELECT
A.var1, A.var2,
B.var3,B.var4,
FROM
sas2.data_set_1 as A left join vantage.data_set_2 as B on A.column1=B.column2) by hadoop;
disconnect from hadoop;
quit;
The problem is that (since I want to "execute" in Cloudera) I cannot use the name sas2 as a Libref for the first schema. I must give it the name "data_analyst" which is not a valid SAS Libname... . So I cannot access/use the dataset sas2.data_set_1 in the query above. Any solution other than renaming the Cloudera schema "data_analyst" to something less than 9 characters ?
Hi LinusH and thank you for your help!
if I let "sas2" as the name of the "data_analyst" schema found in Cloudera, then upon using a dataset of that schema (sas2.dataset1) in an execute statement the following error is produced: "[Error 10072]: Database does not exist: sas2" or "["Error 10001]: Dataset does not exist: dataset1"
But these problems disappear when the name of the libref is the same as the name of the schema..
thanks again
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.