Solved: Parsing JSON stored in a table column - Page 2

SASJedi · Posted 03-07-2025 09:33 AM

Base SAS, including the DATA step, has an absolute length limit of 32,767 bytes for any character value. To process longer text values, you'll need to use DS2 or FedSQL. You can parse JSON in DS2 with the JSON package, but it involves a lot of coding.

I think your best bet would be to extract the JSON value to a text file in HDFS, then use the SAS JSON LLIBNAME engine to read the text file. It's been a VERY long time since I worked with Hadoop, so this code probably won't work as-is, but it might get you pointed in the right direction:

proc sql;
	connect to hadoop (server=my.hadoop.server.com port=10000 user='myUserID' pass='myPassword');
	execute(INSERT OVERWRITE DIRECTORY '/myHadoopUserID/output/myJSON.json'
				ROW FORMAT DELIMITED
				FIELDS TERMINATED BY '\n'
				SELECT jsonColName 
				FROM hive_table) by Hadoop;
	disconnect from hadoop; 
quit;

filename JSONin hadoop '/myHadoopUserID/output/myJSON.json' user='myUserID' pass='myPassword' concat;
libname  JSONin JSON;
proc contents data=JSONin._all_;
run;

Check out my Jedi SAS Tricks for SAS Users

RichaM · Posted 03-07-2025 08:53 PM

The column xml_data where my json file is stored shows the full file .Only when I am transferring it to text file it gets truncated.
Can I use something like lrcl or anything else to get the full text file.Actually I cannot import json file in sas as I have hundreds of Jsons in hadoop from which I need to parse values .Importing all Jsons in SAS is not feasible

Tom · Posted 03-07-2025 09:14 PM

Please explain what you mean by:

The column xml_data where my json file is stored shows the full file .

How did it SHOW it to you? How did you look at the content of the variable XML_DATA? Did you view it in some browser? Which one? Did you have the database write to a file so you could look at it?

What database are you using? What is the type of the variable in the dataset?

How did you write it to a file?

Did you use a SAS data step? What did the FILE statement look like? What was the setting for RECFM= option? What was the setting for LRECL= option? What did the PUT statement look like? How did you get the data from the database to the SAS dataset so you could run a data step? What was the LENGTH of the variable in the SAS dataset?

How do you know it was truncated? How did you LOOK at the generated file?

What was the length of the value in the database? What was the size of the generated file?

And of course does your database have more than a single observation? If each observations is a separate JSON file then you will need to write each observation to a separate text file if you want to process them with the JSON libref engine in SAS. Or is each observation just a JSONL (nldJSON) value? If so then you need to use the methods discussed in thr original thread to write the lines of JSONL text into an actual JSON file so that the JSON engine will read it.

RichaM · Posted 03-08-2025 12:07 AM

My data is in Hadoop and have accessed in HIVE and saw that the column XML_data has the closing braces and I could see the last JSON objects.
The application data present in the code I have put here is taken from Hive in SAS through a passthrough query.
But when I extract the Json to text using the code mentioned the JSOn in text file is truncated.
As for the observation , there is a json associated with each id.So if I reference 5 id's in a code I will get 5 separate JSONs .
I hope I was able to explain

Tom · Posted 03-08-2025 07:03 PM

So you posted code that was reading from a SAS dataset named APPLICATION.

What does that dataset look like?

What is the defined storage length of the variable XML_DATA?

proc contents data=application;
run;

What is maximum length of the values of XML_DATA?

proc sql;
select max(length(xml_data)) as max_length
from application;
quit;

If the maximum length is almost the same of the storage length then you probably have already truncated the data when you made the dataset APPLICATION.

How many different values do you have for the variables used in make the filenames?

How many observations are you going to write to each file?

proc freq data=application;
  tables application_number*interface_id / list missing;
run;

If you have more than one observation per file then most likely you need to use the methods shown in this thread for wrapping the JSON text into a new object so that the file generated is a fully formed JSON file and not a JSONL file.

RichaM · Posted 03-10-2025 04:28 AM

Variable Type Len Format Informat Label
xml_data Char 32767 $32,767.00 $32,767.00 xml_data

The xml_data is of 32767 and as full JSON value but when I am exporting to txt file the JSOn is exported as half.
As per frequncy there in only 1 record

RichaM · Posted 03-10-2025 06:56 AM

Moreover , I have hndreds of JSON files to work on.SO it is not feasible for me to separately kepp everything in SAS and use SAS Libname. Can you help me as because of this the JSONs were put in a hadoop table for easy access

Tom · Posted 03-10-2025 09:12 AM

@RichaM wrote:
Variable Type Len Format Informat Label
xml_data Char 32767 $32,767.00 $32,767.00 xml_data

The xml_data is of 32767 and as full JSON value but when I am exporting to txt file the JSOn is exported as half.
As per frequncy there in only 1 record

So you have a variable that can store a maximum of 32K bytes. When you say it is exported as "half" what exactly does that mean? If the values as 2,000 bytes only the first 1,000 bytes was written to the file? Or does it mean that the original file actually had 64K bytes and only 32K bytes made it into the SAS dataset (and hence was all that was available for SAS to write to a file)? Or did you mean something else.

Could you explain more what this data is? How is the JSON structured? One reason people use JSON text instead of just using simple structures like CSV files is that the data is hard to represent in a simple rectangular structure. Other times JSON is used just because that is what they use and the data could just as easily be saved as a simple file. What is your situation.

What were the results of the other two tests I suggested? What is the maximum length of the actual values? Do any of the groups have more than one observation?

RichaM · Posted 03-10-2025 10:40 AM

The results of the queries :

When I say half , I just meant to say that the value of xml_data when I transfer from HIVE is not full,i.e. full JSON is not exported.I dont get the ending braces .A part of the JSON is only coming to SAS.I agree that JSON length might be greater that 32767 but how to I get the full JSON in SAS.

As for your second question My JSON is a complex nested JSON which has a lot class and sub-classes. It cannot be put in csv file .You can consider itas a high level form which has a lot sub parts, arrays

Tom · Posted 03-10-2025 10:47 AM

So your problem has nothing to do with JSON text.

Your problem is you want to transfer a text string that is longer than 32K bytes from a foreign database into SAS dataset. You will need to find out the HIVE/HADOOP syntax for how to cut a large string into multiple pieces.

You could run a query in Hive to find out how many pieces you will need to store the longest string. Say that is 3. Then in your HIVE query you will generate 3 variables. And in the SAS side when go to writing the file you will write all three variables.

It might also be possible in HIVE to have it "transpose" the data so that instead of 3 variables you will have 3 observations.

You might be able to use DS2 in SAS to pull the long text into a VARCHAR variable in the DS2 step and then use SAS DS2 code to split the long VARCHAR value into either multiple variables or multiple observations.

RichaM · Posted 03-10-2025 11:31 AM

Let me research on it.Thanks

SASJedi · Posted 03-10-2025 07:51 AM

The problem is that the maximum length for a character variable in traditional Base SAS processing is 32,767 bytes (you can see that here in your code):

length name $256 file_name $1000 xml_data $32767;

The JSON you are retrieving from Hadoop apparently exceeds that limit.

You could try retrieving and parsing the JSON in DS2 into a VARCHAR variable. I've used VARCHAR in DS2 to hold text values >5M characters long. If you use DS2, you'll be able to read in the entire JSON string, but you'll have to write your own code using the DS2 JSON package to parse the JSON. DS2 can't read from or write to text files and does not support the JSON LIBNAME engine.

Check out my Jedi SAS Tricks for SAS Users

Re: Parsing JSON stored in a table column - summarized answer

Re: Parsing JSON stored in a table column - summarized answer

Re: Parsing JSON stored in a table column - summarized answer

Re: Parsing JSON stored in a table column - summarized answer

Re: Parsing JSON stored in a table column - summarized answer

Re: Parsing JSON stored in a table column - summarized answer

Re: Parsing JSON stored in a table column - summarized answer

Re: Parsing JSON stored in a table column - summarized answer

Re: Parsing JSON stored in a table column - summarized answer

Re: Parsing JSON stored in a table column - summarized answer

Re: Parsing JSON stored in a table column - summarized answer

Re: Parsing JSON stored in a table column - summarized answer

Registration is open

SAS Training: Just a Click Away