Hi Guys,
I am trying to connect to a Cloudera cluster and run a query there by using an execute statement:
libname sas2 hadoop
uri='.........'
schema="data_analyst"
SERVER="1***.**.**.***"
TRANSCODE_FAIL=SILENT
DBMAX_TEXT=50;
libname vantage hadoop
uri='.........'
schema="vantage"
SERVER="***.**.**.***"
TRANSCODE_FAIL=SILENT
DBMAX_TEXT=50;
proc sql;
connect to HADOOP
(
SERVER="***.**.**.***"
uri='......'
);
execute(
create table sas.T0 as
SELECT
A.var1, A.var2,
B.var3,B.var4,
FROM
sas2.data_set_1 as A left join vantage.data_set_2 as B on A.column1=B.column2) by hadoop;
disconnect from hadoop;
quit;
The problem is that (since I want to "execute" in Cloudera) I cannot use the name sas2 as a Libref for the first schema. I must give it the name "data_analyst" which is not a valid SAS Libname... . So I cannot access/use the dataset sas2.data_set_1 in the query above. Any solution other than renaming the Cloudera schema "data_analyst" to something less than 9 characters ?
Hi LinusH and thank you for your help!
if I let "sas2" as the name of the "data_analyst" schema found in Cloudera, then upon using a dataset of that schema (sas2.dataset1) in an execute statement the following error is produced: "[Error 10072]: Database does not exist: sas2" or "["Error 10001]: Dataset does not exist: dataset1"
But these problems disappear when the name of the libref is the same as the name of the schema..
thanks again
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.