- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I have a dataset in Hadoop which contains nested JSON, I have to parse the JSOn and use the values to create a SAS dataset.
I have used Lateral view explode outer to parse nested JSON value but when I am using this HIVE code in SAS there is some SYntax issue.
Can somebody help me with this
%LET ESQUEMA = gcgdlkmxusg_ecrm_db; %LET HDFS = hv_u_stt_general_emmrm; %LET amb = xyz PROC SQL; CONNECT TO HADOOP( host = "bigdataplatform-gcg-&amb..cal.nsroot.net" schema = &ESQUEMA. HDFS_TEMPDIR = "/data/&HDFS./work/hive/&HDFS._work/" DBMAX_TEXT = 32768 uri = "jdbc:hive2://bigdataplatform-gcg-&amb..lac.nsroot.net:10000/&ESQUEMA.;principal=hive/bigdataplatform-gcg-&amb..lac.nsroot.net@LAC.NSROOT.NET;ssl=true" scratch_db = "&HDFS._work" subchar=questionmark ); CREATE TABLE Applicant AS SELECT * FROM CONNECTION TO HADOOP( create table product as( select( xml_data, application_number,interface_id, get_json_object(xml_data,'$.application.applicationid')as applicationid, cast(get_json_object(xml_data,'$.application.applicants[0].nationality') as INT) as Nationality, regexp_replace(regexp_replace(get_json_object(xml_data, '$.application.products[]'), '\\[|\\]', ''), '\\}\\,\\{', '\\}\\;\\{') AS phone_JSON) from &ESQUEMA..ci_rpdm_t_d where application_number='202829000002900' ) select( application_number, INTERFACE_ID, applicationid, nationality, get_json_object(phone_item,'$.type') as phonetype, get_json_object(phone_item,'$.productCode') as phoneAvailabilityFlag) from product LATERAL VIEW OUTER EXPLODE(split(phone_JSON, ';')) p AS phone_item ); disconnect from hadoop; quit;
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks a lot for your help. I figured out the missing parenthesis location.
As for the alias , It is needed for lateral explode to reference the variables.
Writing the working code below:
%LET ESQUEMA = gcgdlkusg_ecrm_db; %LET HDFS = hv_u_stg_ral_ecrm; %LET amb = ut /*APPlCANT*/ PROC SQL; CONNECT TO HADOOP( host = "bigdataplatform-gcg-&amb..lac.nsroot.net" schema = &ESQUEMA. HDFS_TEMPDIR = "/data/&HDFS./work/hive/&HDFS._work/" DBMAX_TEXT = 32768 uri = "jdbc:hive2://bigdataplatform-gcg-&amb..lac.nsroot.net:10000/&ESQUEMA.;principal=hive/bigdataplatform-gcg-&amb..lac.nsroot.net@LAC.NSROOT.NET;ssl=true" scratch_db = "&HDFS._work" subchar=questionmark ); CREATE TABLE Phone AS SELECT * FROM CONNECTION TO HADOOP( with phn as ( select xml_data, application_number,interface_id, get_json_object(xml_data,'$.application.applicationid')as applicationid, cast(get_json_object(xml_data,'$.application.applicants[0].nationality') as INT) as Nationality, get_json_object(xml_data,'$.application.applicants[0].cin') as cin, regexp_replace(regexp_replace(get_json_object(xml_data,'$.application.applicants[].phones'), '\\[|\\]', ''), '\\}\\,\\{', '\\}\\;\\{') AS phone_JSON from &ESQUEMA..ci_rp_t_d where application_number='2024123' AND INTERFACE_ID=1023 ) select application_number, INTERFACE_ID, applicationid, nationality, cin, get_json_object(phone_item,'$.phoneType') as phonetype, get_json_object(phone_item,'$.phoneAvailabilityFlag') as phoneAvailabilityFlag from phn LATERAL VIEW OUTER EXPLODE(split(phone_JSON, ';')) p AS phone_item ); /* disconnect from hadoop;*/ quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Do NOT tell HADOOP to make a TABLE. That means it will not return any observations to SAS for the SAS select statement to retrieve.
Just ask it to return the observations you want to store into the SAS dataset.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If I dont use create table here , How do i use it below where I am referencing this product table.
In Hive this syntax works and I able to understand your view but I am not able to get the change of syntax for SAS
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@RichaM wrote:
If I dont use create table here , How do i use it below where I am referencing this product table.
In Hive this syntax works and I able to understand your view but I am not able to get the change of syntax for SAS
That does not look like valid SQL to me. Are you sure it works. Modern versions of SQL would use WITH to specify a query that you want to reference as if it was an actual table later in the same query.
with product as (
select xml_data, application_number,interface_id
, get_json_object(xml_data,'$.application.applicationid') as applicationid
, cast(get_json_object(xml_data,'$.application.applicants[0].nationality') as INT) as Nationality
, regexp_replace(regexp_replace(get_json_object(xml_data, '$.application.products[]'), '\\[|\\]', ''), '\\}\\,\\{', '\\}\\;\\{') AS phone_JSON
from &ESQUEMA..ci_rpdm_t_d
where application_number='202829000002900'
)
select application_number
, INTERFACE_ID
, applicationid
, nationality
, get_json_object(phone_item,'$.type') as phonetype
, get_json_object(phone_item,'$.productCode') as phoneAvailabilityFlag
from product
LATERAL VIEW OUTER EXPLODE(split(phone_JSON, ';')) p AS phone_item
If you really did want to make a table named PRODUCT and then QUERY that table you should do that in TWO steps.
execute by hadoop
(create table product .....
);
create table sasdataset as select * from connection to hadoop
(select var1,var2, .... from product .....
);
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
This query is working fine for me in HIVE
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@RichaM wrote:
This query is working fine for me in HIVE
Define working fine. Do you mean it can be used to create table? Or can it also be used to stream its results to some viewer tool. It is the latter that the FROM CONNECTION TO syntax in needs SAS needs to get.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
What are the types of the variables you a returning to SAS?
select application_number
, INTERFACE_ID
, applicationid
, nationality
, get_json_object(phone_item,'$.type') as phonetype
, get_json_object(phone_item,'$.productCode') as phoneAvailabilityFlag
Although I would expect a different error message if it was an unknown variable type.
Also your original code had a macro variable in it to make it dynamic.
Does it work when you hard code the value instead of using the macro variable reference?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
This looks like documentation that HIVE support WITH clause (also known as Common Table Expression or CTE).
https://docs.cloudera.com/runtime/7.3.1/using-hiveql/topics/hive_create_a_table_using_a_cte.html
To work with PROC SQL's FROM CONNECTION TO syntax you will want to use example number 3:
WITH q1 AS (SELECT key from src where key = '5')
SELECT * from q1
The name Q1 that appears after the WITH keyword is the name you use in the FROM clause of the actual SELECT statement that follows the WITH clause.
Make sure that all of the variables you are selecting are valid for porting to SAS. You might need cast() some of the strange variable types that HIVE might support into either fixed length character strings or floating point numbers so that SAS can use them.
WITH q1 AS (SELECT key from src where key = '5')
SELECT cast(key as float) as key from q1
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I used 'With' as suggested but still I am getting the same error.
I think the issue is because of the Lteral view function which works fine in HIVE but there are some syntax issue with SAS.Attaching both code and error log
%LET ESQUEMA = gcgdlsg_ecrm_db; %LET HDFS = hv_u_stg_general_ecrm; %LET amb = uat; /*APPlCANT*/ PROC SQL; CONNECT TO HADOOP( host = "bigdataplatform-gcg-&ab..lac.nsroot.net" schema = &ESQUEMA. HDFS_TEMPDIR = "/data/&HDFS./work/hive/&HDFS._work/" DBMAX_TEXT = 32768 uri = "jdbc:hive2://bigdataplatform-gcg-&amb..lac.nsroot.net:10000/&ESQUEMA.;principal=hive/bigdataplatform-gcg-&amb..lac.nsroot.net@LAC.NSROOT.NET;ssl=true" scratch_db = "&HDFS._work" subchar=questionmark ); CREATE TABLE Applicant AS SELECT * FROM CONNECTION TO HADOOP( with product as( select( xml_data, application_number,interface_id, get_json_object(xml_data,'$.application.applicationid')as applicationid, cast(get_json_object(xml_data,'$.application.applicants[0].nationality') as INT) as Nationality, regexp_replace(regexp_replace(get_json_object(xml_data, '$.application.products[]'), '\\[|\\]', ''), '\\}\\,\\{', '\\}\\;\\{') AS phone_JSON) from &ESQUEMA..ci_rpdm_t_d where application_number='20289898' AND INTERFACE_ID=102 ) select( application_number, INTERFACE_ID, applicationid, nationality, get_json_object(phone_item,'$.type') as phonetype, get_json_object(phone_item,'$.productCode') as phoneAvailabilityFlag) from product LATERAL VIEW OUTER EXPLODE(split(phone_JSON, ';')) p AS phone_item ); disconnect from hadoop; quit;
ERROR: Prepare error: Method not supportedError while compiling statement: FAILED: ParseException line 1:123 missing ) at 'as' near
'ci_rpdm_t_d' in statement
SQL statement: with product as( select( xml_data, application_number,interface_id,
get_json_object(xml_data,'$.application.applicationid')as applicationid,
cast(get_json_object(xml_data,'$.application.applicants[0].nationality') as INT) as Nationality,
regexp_replace(regexp_replace(get_json_object(xml_data, '$.application.products[]'), '\\[|\\]', ''), '\\}\\,\\{',
'\\}\\;\\{') AS phone_JSON) from gcgdlkmxusg_ecrm_db.ci_rpdm_t_d where application_number='20240829000002810' AND
INTERFACE_ID=102 ) select( application_number, INTERFACE_ID, applicationid, nationality,
get_json_object(phone_item,'$.type') as phonetype, get_json_object(phone_item,'$.productCode') as phoneAvailabilityFlag)
from product LATERAL VIEW OUTER EXPLODE(split(phone_JSON, ';')) p AS phone_item
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
That error message seems to pointing at this line of code
from &ESQUEMA..ci_rpdm_t_d
What is the value of the macro variable EQUEMA?
What happens if you replace the macro variable reference with that actual value? So if &EQUEMA. resolves to FRED what happens if you modify the code to use:
from FRED.ci_rpdm_t_d
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for your reply. I found out that this was because of an additional round brace near line. I have corrected it . But now am getting a similar error again. I figured out that this is because of a missing brace for 'CONNECTION TO HADOOP' statement but cannot figure out where to place it as it is giving me error every time.
Error: ERROR: Prepare error: Method not supportedError while compiling statement: FAILED: ParseException line 1:760 cannot recognize input
near ';' '<EOF>' '<EOF>' in lateral view
SQL statement: with ecs as ( select xml_data, application_number,interface_id,
get_json_object(xml_data,'$.application.applicationid')as applicationid,
regexp_replace(regexp_replace(get_json_object(xml_data,'$.application.applicants[0].internalSystem.ecs[]'), '\\[|\\]', ''),
'\\}\\,\\{', '\\}\\;\\{') AS ecs_JSOn from gcgdlkmxusg_ecrm_db.ci_rpdm_t_d where application_number='20240829000002810' AND
INTERFACE_ID=102 ) select application_number, INTERFACE_ID, applicationid, get_json_object(ecs_item,'$.productLine') as
productLine, get_json_object(ecs_item,'$.productCode') as productCode, get_json_object(ecs_item,'$.creditLineAmtCards') as
creditLineAmtCards, get_json_object(ecs_item,'$.cin') as cin from ecs LATERAL VIEW OUTER EXPLODE(split(ecs_json, ';')) p AS
ecs_item;
PROC SQL; CONNECT TO HADOOP( host = "bigdataplatform-gcg-&ab..lac.nsroot.net" schema = &ESQUEMA. HDFS_TEMPDIR = "/data/&HDFS./work/hive/&HDFS._work/" DBMAX_TEXT = 32768 uri = "jdbc:hive2://bigdataplatform-gcg-&amb..lac.nsroot.net:10000/&ESQUEMA.;principal=hive/bigdataplatform-gcg-&amb..lac.nsroot.net@LAC.NSROOT.NET;ssl=true" scratch_db = "&HDFS._work" subchar=questionmark ); CREATE TABLE Applicant AS SELECT * FROM CONNECTION TO HADOOP( with ecs as ( select xml_data, application_number,interface_id, get_json_object(xml_data,'$.application.applicationid')as applicationid, regexp_replace(regexp_replace(get_json_object(xml_data,'$.application.applicants[0].internalSystem.ecs[]'), '\\[|\\]', ''), '\\}\\,\\{', '\\}\\;\\{') AS ecs_JSOn from gcgdlkmxsg_ecrm_db.ci_rpdm_t_d where application_number='202410' AND INTERFACE_ID=1089 ) select application_number, INTERFACE_ID, applicationid, get_json_object(ecs_item,'$.productLine') as productLine, get_json_object(ecs_item,'$.productCode') as productCode, get_json_object(ecs_item,'$.creditLineAmtCards') as creditLineAmtCards, get_json_object(ecs_item,'$.cin') as cin from ecs LATERAL VIEW OUTER EXPLODE(split(ecs_json, ';')) p AS ecs_item;) /* disconnect from hadoop;*/ quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Why do you have that extra
p AS ecs_item;
stuff after the LATERAL VIEW clause?
The examples I see on line look like:
FROM mytable LATERAL VIEW explode(myarray) exploded_array_table
So the end of your query should probable just be just one of these
from ecs LATERAL VIEW OUTER EXPLODE(split(ecs_json, ';')) p
from ecs LATERAL VIEW OUTER EXPLODE(split(ecs_json, ';')) ecs_item
It does not look to me like it matters what ALIAS you code for the "exploded_array_table" since your SELECT statement is not referencing any table aliases.
Hint to avoid loosing track of the closing parentheses in your multiple line code put the ) at the beginning of NEW LINE instead of hiding them at the end of (a possibly very long) line of code.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks a lot for your help. I figured out the missing parenthesis location.
As for the alias , It is needed for lateral explode to reference the variables.
Writing the working code below:
%LET ESQUEMA = gcgdlkusg_ecrm_db; %LET HDFS = hv_u_stg_ral_ecrm; %LET amb = ut /*APPlCANT*/ PROC SQL; CONNECT TO HADOOP( host = "bigdataplatform-gcg-&amb..lac.nsroot.net" schema = &ESQUEMA. HDFS_TEMPDIR = "/data/&HDFS./work/hive/&HDFS._work/" DBMAX_TEXT = 32768 uri = "jdbc:hive2://bigdataplatform-gcg-&amb..lac.nsroot.net:10000/&ESQUEMA.;principal=hive/bigdataplatform-gcg-&amb..lac.nsroot.net@LAC.NSROOT.NET;ssl=true" scratch_db = "&HDFS._work" subchar=questionmark ); CREATE TABLE Phone AS SELECT * FROM CONNECTION TO HADOOP( with phn as ( select xml_data, application_number,interface_id, get_json_object(xml_data,'$.application.applicationid')as applicationid, cast(get_json_object(xml_data,'$.application.applicants[0].nationality') as INT) as Nationality, get_json_object(xml_data,'$.application.applicants[0].cin') as cin, regexp_replace(regexp_replace(get_json_object(xml_data,'$.application.applicants[].phones'), '\\[|\\]', ''), '\\}\\,\\{', '\\}\\;\\{') AS phone_JSON from &ESQUEMA..ci_rp_t_d where application_number='2024123' AND INTERFACE_ID=1023 ) select application_number, INTERFACE_ID, applicationid, nationality, cin, get_json_object(phone_item,'$.phoneType') as phonetype, get_json_object(phone_item,'$.phoneAvailabilityFlag') as phoneAvailabilityFlag from phn LATERAL VIEW OUTER EXPLODE(split(phone_JSON, ';')) p AS phone_item ); /* disconnect from hadoop;*/ quit;