SAS Programming

DATA Step, Macro, Functions and more
BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
RichaM
Obsidian | Level 7

I have a dataset in Hadoop which contains nested JSON, I have to parse the JSOn and use the values to create a SAS dataset.

I have used Lateral view explode outer to parse nested JSON value but when I am using this HIVE code in SAS there is some SYntax issue. 

Can somebody help me with this

%LET ESQUEMA = gcgdlkmxusg_ecrm_db;
%LET HDFS = hv_u_stt_general_emmrm; 
%LET amb = xyz


PROC SQL;
    CONNECT TO HADOOP(
    host = "bigdataplatform-gcg-&amb..cal.nsroot.net"
    schema = &ESQUEMA.
	    HDFS_TEMPDIR = "/data/&HDFS./work/hive/&HDFS._work/"
    DBMAX_TEXT = 32768
    uri = "jdbc:hive2://bigdataplatform-gcg-&amb..lac.nsroot.net:10000/&ESQUEMA.;principal=hive/bigdataplatform-gcg-&amb..lac.nsroot.net@LAC.NSROOT.NET;ssl=true" 
    scratch_db = "&HDFS._work"
	subchar=questionmark
    ); 
	    CREATE TABLE Applicant AS
    SELECT * 
FROM CONNECTION TO HADOOP(
create table  product as(
select( 
  xml_data, application_number,interface_id,
  get_json_object(xml_data,'$.application.applicationid')as applicationid,
  cast(get_json_object(xml_data,'$.application.applicants[0].nationality') as INT) as Nationality,
 regexp_replace(regexp_replace(get_json_object(xml_data, '$.application.products[]'), '\\[|\\]', ''), '\\}\\,\\{', '\\}\\;\\{') AS phone_JSON)
  from &ESQUEMA..ci_rpdm_t_d
  where application_number='202829000002900' 
  )
  select(
  application_number,
  INTERFACE_ID,
  applicationid,
  nationality,
  get_json_object(phone_item,'$.type') as phonetype,
  get_json_object(phone_item,'$.productCode') as phoneAvailabilityFlag)
  from product
  LATERAL VIEW OUTER EXPLODE(split(phone_JSON, ';')) p AS phone_item
);
disconnect from hadoop;
quit;

1 ACCEPTED SOLUTION

Accepted Solutions
RichaM
Obsidian | Level 7

Thanks a lot for your help. I figured out the missing parenthesis location.

As for the alias , It is needed for lateral explode to reference the variables.

Writing the working code below:

%LET ESQUEMA = gcgdlkusg_ecrm_db;
%LET HDFS = hv_u_stg_ral_ecrm; 
%LET amb = ut

/*APPlCANT*/

PROC SQL;
    CONNECT TO HADOOP(
    host = "bigdataplatform-gcg-&amb..lac.nsroot.net"
    schema = &ESQUEMA.
	    HDFS_TEMPDIR = "/data/&HDFS./work/hive/&HDFS._work/"
    DBMAX_TEXT = 32768
    uri = "jdbc:hive2://bigdataplatform-gcg-&amb..lac.nsroot.net:10000/&ESQUEMA.;principal=hive/bigdataplatform-gcg-&amb..lac.nsroot.net@LAC.NSROOT.NET;ssl=true" 
    scratch_db = "&HDFS._work"
	subchar=questionmark
    ); 
	    CREATE TABLE Phone AS
    SELECT * 
FROM CONNECTION TO HADOOP(
with phn as (
select 
  xml_data, application_number,interface_id,
    get_json_object(xml_data,'$.application.applicationid')as applicationid,
  cast(get_json_object(xml_data,'$.application.applicants[0].nationality') as INT) as Nationality,
get_json_object(xml_data,'$.application.applicants[0].cin') as cin,
 
   regexp_replace(regexp_replace(get_json_object(xml_data,'$.application.applicants[].phones'), '\\[|\\]', ''), '\\}\\,\\{', '\\}\\;\\{') AS phone_JSON
  from &ESQUEMA..ci_rp_t_d
  where application_number='2024123' AND INTERFACE_ID=1023
  )
  select 
  application_number,
  INTERFACE_ID,
  applicationid,
  nationality,
  cin,
  get_json_object(phone_item,'$.phoneType') as phonetype,
  get_json_object(phone_item,'$.phoneAvailabilityFlag') as phoneAvailabilityFlag
  from phn
  LATERAL VIEW OUTER EXPLODE(split(phone_JSON, ';')) p AS phone_item
);
/*  disconnect from hadoop;*/
quit;

View solution in original post

13 REPLIES 13
Tom
Super User Tom
Super User

Do NOT tell HADOOP to make a TABLE.  That means it will not return any observations to SAS for the SAS select statement to retrieve.

 

Just ask it to return the observations you want to store into the SAS dataset.

Tom_0-1741363413882.png

 

RichaM
Obsidian | Level 7

If I dont use create table here , How do i use it below where I am referencing this product table.

In Hive this syntax works and I able to understand your view but I am not able to get the change of syntax for SAS

Tom
Super User Tom
Super User

@RichaM wrote:

If I dont use create table here , How do i use it below where I am referencing this product table.

In Hive this syntax works and I able to understand your view but I am not able to get the change of syntax for SAS


That does not look like valid SQL to me.  Are you sure it works.  Modern versions of SQL would use WITH to specify a query that you want to reference as if it was an actual table later in the same query.

 

 with product as (
  select xml_data, application_number,interface_id
       , get_json_object(xml_data,'$.application.applicationid') as applicationid
       , cast(get_json_object(xml_data,'$.application.applicants[0].nationality') as INT) as Nationality
       , regexp_replace(regexp_replace(get_json_object(xml_data, '$.application.products[]'), '\\[|\\]', ''), '\\}\\,\\{', '\\}\\;\\{') AS phone_JSON
  from &ESQUEMA..ci_rpdm_t_d
  where application_number='202829000002900' 
  )

  select application_number
       , INTERFACE_ID
       , applicationid
       , nationality
       , get_json_object(phone_item,'$.type') as phonetype
       , get_json_object(phone_item,'$.productCode') as phoneAvailabilityFlag
    from product
    LATERAL VIEW OUTER EXPLODE(split(phone_JSON, ';')) p AS phone_item

If you really did want to make a table named PRODUCT and then QUERY that table you should do that in TWO steps.

execute by hadoop
(create table product .....
);

create table sasdataset as select * from connection to hadoop
(select var1,var2, .... from product ..... 
);

 

RichaM
Obsidian | Level 7

This query is working fine for me in HIVE 

Tom
Super User Tom
Super User

@RichaM wrote:

This query is working fine for me in HIVE 


Define working fine.  Do you mean it can be used to create table?  Or can  it also be used to stream its results to some viewer tool.  It is the latter that the FROM CONNECTION TO syntax in needs SAS needs to get.

RichaM
Obsidian | Level 7
Working fine in Hive for me.It displays the final result to me.But in sas when I use the query it gives EOF parse exception
Tom
Super User Tom
Super User

What are the types of the variables you a returning to SAS?

select application_number
       , INTERFACE_ID
       , applicationid
       , nationality
       , get_json_object(phone_item,'$.type') as phonetype
       , get_json_object(phone_item,'$.productCode') as phoneAvailabilityFlag

Although I would expect a different error message if it was an unknown variable type.

 

Also your original code had a macro variable in it to make it dynamic. 

Does it work when you hard code the value instead of using the macro variable reference?

Tom
Super User Tom
Super User

This looks like documentation that HIVE support WITH clause (also known as Common Table Expression or CTE).

https://docs.cloudera.com/runtime/7.3.1/using-hiveql/topics/hive_create_a_table_using_a_cte.html

 

To work with PROC SQL's FROM CONNECTION TO syntax you will want to use example number 3:

WITH q1 AS (SELECT key from src where key = '5') 
  SELECT * from q1

The name Q1 that appears after the WITH keyword is the name you use in the FROM clause of the actual SELECT statement that follows the WITH clause.

 

Make sure that all of the variables you are selecting are valid for porting to SAS.  You might need cast() some of the strange variable types that HIVE might support into either fixed length character strings or floating point numbers so that SAS can use them.

WITH q1 AS (SELECT key from src where key = '5') 
  SELECT cast(key as float) as key from q1
RichaM
Obsidian | Level 7

I used 'With' as suggested but still I am getting the same error.

I think the issue is because of the Lteral view function which works fine in HIVE but there are some syntax issue with SAS.Attaching both code and error log

%LET ESQUEMA = gcgdlsg_ecrm_db;
%LET HDFS = hv_u_stg_general_ecrm;
%LET amb = uat; 

/*APPlCANT*/

PROC SQL;
    CONNECT TO HADOOP(
    host = "bigdataplatform-gcg-&ab..lac.nsroot.net"
    schema = &ESQUEMA.
	    HDFS_TEMPDIR = "/data/&HDFS./work/hive/&HDFS._work/"
    DBMAX_TEXT = 32768
    uri = "jdbc:hive2://bigdataplatform-gcg-&amb..lac.nsroot.net:10000/&ESQUEMA.;principal=hive/bigdataplatform-gcg-&amb..lac.nsroot.net@LAC.NSROOT.NET;ssl=true" 
    scratch_db = "&HDFS._work"
	subchar=questionmark
    ); 
	    CREATE TABLE Applicant AS
    SELECT * 
FROM CONNECTION TO HADOOP(
with product as(
select( 
  xml_data, application_number,interface_id,
  get_json_object(xml_data,'$.application.applicationid')as applicationid,
  cast(get_json_object(xml_data,'$.application.applicants[0].nationality') as INT) as Nationality,
 regexp_replace(regexp_replace(get_json_object(xml_data, '$.application.products[]'), '\\[|\\]', ''), '\\}\\,\\{', '\\}\\;\\{') AS phone_JSON)
  from &ESQUEMA..ci_rpdm_t_d
  where application_number='20289898' AND INTERFACE_ID=102
  )
  select(
  application_number,
  INTERFACE_ID,
  applicationid,
  nationality,
  get_json_object(phone_item,'$.type') as phonetype,
  get_json_object(phone_item,'$.productCode') as phoneAvailabilityFlag)
  from product
  LATERAL VIEW OUTER EXPLODE(split(phone_JSON, ';')) p AS phone_item
);
disconnect from hadoop;
quit;


ERROR: Prepare error: Method not supportedError while compiling statement: FAILED: ParseException line 1:123 missing ) at 'as' near
'ci_rpdm_t_d' in statement
SQL statement: with product as( select( xml_data, application_number,interface_id,
get_json_object(xml_data,'$.application.applicationid')as applicationid,
cast(get_json_object(xml_data,'$.application.applicants[0].nationality') as INT) as Nationality,
regexp_replace(regexp_replace(get_json_object(xml_data, '$.application.products[]'), '\\[|\\]', ''), '\\}\\,\\{',
'\\}\\;\\{') AS phone_JSON) from gcgdlkmxusg_ecrm_db.ci_rpdm_t_d where application_number='20240829000002810' AND
INTERFACE_ID=102 ) select( application_number, INTERFACE_ID, applicationid, nationality,
get_json_object(phone_item,'$.type') as phonetype, get_json_object(phone_item,'$.productCode') as phoneAvailabilityFlag)
from product LATERAL VIEW OUTER EXPLODE(split(phone_JSON, ';')) p AS phone_item

Tom
Super User Tom
Super User

That error message seems to pointing at this line of code

  from &ESQUEMA..ci_rpdm_t_d

What is the value of the macro variable EQUEMA?

What happens if you replace the macro variable reference with that actual value?  So if &EQUEMA. resolves to FRED what happens if you modify the code to use:

  from FRED.ci_rpdm_t_d
RichaM
Obsidian | Level 7

Thanks for your reply. I found out that this was because of an additional round brace near line. I have corrected it . But now am getting a similar error again. I figured out that this is because of a missing brace for 'CONNECTION TO HADOOP' statement but cannot figure out where to place it as it is giving me error every time.

 

Error: ERROR: Prepare error: Method not supportedError while compiling statement: FAILED: ParseException line 1:760 cannot recognize input
near ';' '<EOF>' '<EOF>' in lateral view
SQL statement: with ecs as ( select xml_data, application_number,interface_id,
get_json_object(xml_data,'$.application.applicationid')as applicationid,
regexp_replace(regexp_replace(get_json_object(xml_data,'$.application.applicants[0].internalSystem.ecs[]'), '\\[|\\]', ''),
'\\}\\,\\{', '\\}\\;\\{') AS ecs_JSOn from gcgdlkmxusg_ecrm_db.ci_rpdm_t_d where application_number='20240829000002810' AND
INTERFACE_ID=102 ) select application_number, INTERFACE_ID, applicationid, get_json_object(ecs_item,'$.productLine') as
productLine, get_json_object(ecs_item,'$.productCode') as productCode, get_json_object(ecs_item,'$.creditLineAmtCards') as
creditLineAmtCards, get_json_object(ecs_item,'$.cin') as cin from ecs LATERAL VIEW OUTER EXPLODE(split(ecs_json, ';')) p AS
ecs_item;

PROC SQL;
    CONNECT TO HADOOP(
    host = "bigdataplatform-gcg-&ab..lac.nsroot.net"
    schema = &ESQUEMA.
	    HDFS_TEMPDIR = "/data/&HDFS./work/hive/&HDFS._work/"
    DBMAX_TEXT = 32768
    uri = "jdbc:hive2://bigdataplatform-gcg-&amb..lac.nsroot.net:10000/&ESQUEMA.;principal=hive/bigdataplatform-gcg-&amb..lac.nsroot.net@LAC.NSROOT.NET;ssl=true" 
    scratch_db = "&HDFS._work"
	subchar=questionmark
    ); 
	    CREATE TABLE Applicant AS
    SELECT * 
FROM CONNECTION TO HADOOP(
with ecs as (
select 
  xml_data, application_number,interface_id,
  get_json_object(xml_data,'$.application.applicationid')as applicationid,
 
   regexp_replace(regexp_replace(get_json_object(xml_data,'$.application.applicants[0].internalSystem.ecs[]'), '\\[|\\]', ''), '\\}\\,\\{', '\\}\\;\\{') AS ecs_JSOn
  from gcgdlkmxsg_ecrm_db.ci_rpdm_t_d
  where application_number='202410' AND INTERFACE_ID=1089
  )
  select 
  application_number,
  INTERFACE_ID,
  applicationid,
   get_json_object(ecs_item,'$.productLine') as productLine,
  get_json_object(ecs_item,'$.productCode') as productCode,
  get_json_object(ecs_item,'$.creditLineAmtCards') as creditLineAmtCards,
  get_json_object(ecs_item,'$.cin') as cin
  from ecs
  LATERAL VIEW OUTER EXPLODE(split(ecs_json, ';')) p AS ecs_item;)
/*  disconnect from hadoop;*/
quit;

 

Tom
Super User Tom
Super User

Why do you have that extra

 p AS ecs_item;

stuff after the LATERAL VIEW clause?

 

The examples I see on line look like:

FROM mytable LATERAL VIEW explode(myarray) exploded_array_table

 

So the end of your query should probable just be just one of these

from ecs LATERAL VIEW OUTER EXPLODE(split(ecs_json, ';')) p
from ecs LATERAL VIEW OUTER EXPLODE(split(ecs_json, ';')) ecs_item

It does not look to me like it matters what ALIAS you code for the "exploded_array_table" since your SELECT statement is not referencing any table aliases.

 

Hint to avoid loosing track of the closing parentheses in your multiple line code put the ) at the beginning of NEW LINE instead of hiding them at the end of (a possibly very long) line of code.

RichaM
Obsidian | Level 7

Thanks a lot for your help. I figured out the missing parenthesis location.

As for the alias , It is needed for lateral explode to reference the variables.

Writing the working code below:

%LET ESQUEMA = gcgdlkusg_ecrm_db;
%LET HDFS = hv_u_stg_ral_ecrm; 
%LET amb = ut

/*APPlCANT*/

PROC SQL;
    CONNECT TO HADOOP(
    host = "bigdataplatform-gcg-&amb..lac.nsroot.net"
    schema = &ESQUEMA.
	    HDFS_TEMPDIR = "/data/&HDFS./work/hive/&HDFS._work/"
    DBMAX_TEXT = 32768
    uri = "jdbc:hive2://bigdataplatform-gcg-&amb..lac.nsroot.net:10000/&ESQUEMA.;principal=hive/bigdataplatform-gcg-&amb..lac.nsroot.net@LAC.NSROOT.NET;ssl=true" 
    scratch_db = "&HDFS._work"
	subchar=questionmark
    ); 
	    CREATE TABLE Phone AS
    SELECT * 
FROM CONNECTION TO HADOOP(
with phn as (
select 
  xml_data, application_number,interface_id,
    get_json_object(xml_data,'$.application.applicationid')as applicationid,
  cast(get_json_object(xml_data,'$.application.applicants[0].nationality') as INT) as Nationality,
get_json_object(xml_data,'$.application.applicants[0].cin') as cin,
 
   regexp_replace(regexp_replace(get_json_object(xml_data,'$.application.applicants[].phones'), '\\[|\\]', ''), '\\}\\,\\{', '\\}\\;\\{') AS phone_JSON
  from &ESQUEMA..ci_rp_t_d
  where application_number='2024123' AND INTERFACE_ID=1023
  )
  select 
  application_number,
  INTERFACE_ID,
  applicationid,
  nationality,
  cin,
  get_json_object(phone_item,'$.phoneType') as phonetype,
  get_json_object(phone_item,'$.phoneAvailabilityFlag') as phoneAvailabilityFlag
  from phn
  LATERAL VIEW OUTER EXPLODE(split(phone_JSON, ';')) p AS phone_item
);
/*  disconnect from hadoop;*/
quit;

sas-innovate-white.png

Special offer for SAS Communities members

Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.

 

View the full agenda.

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 13 replies
  • 1105 views
  • 0 likes
  • 2 in conversation