BookmarkSubscribeRSS Feed
miller3b
Calcite | Level 5

Hi,

I am trying to fill in the blanks on a use case and have some constraints that I'm facing.

 

> can't introduce any UDF not natively in hortonworks offering of hadoop

> cannot touch permanent data persistence solution within hadoop

 

We are currently storing some JSON objects in HBase with mapping through HIVE in order to enable the SAS connector to work directly with the data (reliant on hiveserver2). No problems with data access itself. The SAS limitiations on hive tables and proc sql make it that any return has to be below 32XXX characters which is not acceptable for this solution. The JSON reader for SAS can handle but will need to consume from a file. The SAS software site outside of the cluster and not directly on any nodes.

 

The code I have been using is pasted below. The code executes fine and creates a comma separated file to work with as far as the query goes. The reader engine I am using needs the entire file to conform to JSON standards which would mean adding an opening/closing [ ] to make it appear as a array type (SAS JSON Reader 9.4.4).

 

Is there a way to specify adding these characters in the initial write? If writing the file to HDFS how would I go about appending (or rather copying merging) in the process through the SAS Hadoop connector?

 

If not is there a good approach to maybe bringing the file local and adding the characters as needed for the reader?

Current:

{JSON Data},

{JSON Data}

Desired (not sensitive to line feeds from reader perspective):

[{JSON Data},

{JSON Data}]

 

proc sql;

connect to hadoop (XXXXXX - connection string information);

execute (INSERT OVERWRITE DIRECTORY '/home/bmiller'

ROW FORMAT DELIMITED FIELDS TERMINATED BY ','

SELECT JSON from results r

inner join

hbaseresults hb on r.confirmationnumber = hb.key

where clientcode = &clientcode and

artifacttype = 'CrfResult') by hadoop;

disconnect from hadoop;

quit;

 

Any help or suggestions would be greatly appreciated.

 

Thanks,

 

Ben

4 REPLIES 4
JBailey
Barite | Level 11

Hi @miller3b

 

I have no experience dealing with Hive going to HBase. The 32k issue is a known problem with Hive. This GitHub may help you.

 

https://github.com/Jeff-Bailey/SGF2016_SAS3880_Insiders_Guide_Hadoop_HOW

 

Take a look at the chapter on 32k String Thing. It's Exercise 5.

 

Best wishes,

Jeff

miller3b
Calcite | Level 5

Excellent. Thank you Jeff. I'm sure there are some other tricks in here that I need as well.

JBailey
Barite | Level 11

Hi @miller3b

 

You are quite welcome! 

 

Best wishes,

Jeff

AnnaBrown
Community Manager

Hi miller3b,

 

I'm glad you found some useful info! If one of the replies was the exact solution to your problem, can you "Accept it as a solution"? Or if one was particularly helpful, feel free to "Like" it. This will help other community members who may run into the same issue know what worked.

Thanks!
Anna


Join us for SAS Community Trivia
SAS Bowl XXIX, The SAS Hackathon
Wednesday, March 8, 2023, at 10 AM ET | #SASBowl

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1113 views
  • 1 like
  • 3 in conversation