A bit of history
Yesterday I asked this question, about reading a JSON file which had arbitrary carriage returns in it, and reading it into one variable. This was because the random new lines were messing with my INPUT statement.
I'm having a separate issue with the same file which I thought I'd raise separately.
My new problem
Now that I've sanitised the file to remove the random new lines, I have a text file containing JSON which has a single line of text whose length exceeds 32767 characters (worth pointing out that even before I removed the random new lines, there were still lines longer than 32767 characters!)
When I try to read this line in, my INPUT step only processes records until the 32767th character, then ends, so I'm left with a fraction of the observations that I should have.
I assume this is either due to the maximum length of the input buffer or the fact that I've set the LRECL to its maximum value (32767)?
What I'd like to do
Ideally I'd like to specify a record delimiter string (e.g. "},{") which could split my massive, massive line to treat each JSON object (which are wrapped in curly brackets) individually. Is this possible? Any solution which will allow me to read this whole file would work though.
Any help appreciated.
<<Sample file attached>>
Aha! My assertion that 32767 was the maximum length of LRECL was incorrect. Thanks to this post, I've realised I could bump the size of the _infile_ to whatever I want, and this has stopped my issues.
Thank you @Tom !
Could you use PROC JSON to get data. or just use JAVA code to get it .
Ksharp - As I understand it, PROC JSON is only for outputting JSON, not consuming it. To be honest it is looking more and more like I might have to use Java.
I take it it's just not possible to specify a "record delimiter" then?
I did post some links on reading Json files in your first thread.
They were presentations on forums
Jaap - Thanks, I did check those out but couldn't immediately get them working, and wanted to see if I could tweak the parsing code I already had before scrapping it. I do appreciate you linking them, and will probably have to use that approach the more I look into it!
Aha! My assertion that 32767 was the maximum length of LRECL was incorrect. Thanks to this post, I've realised I could bump the size of the _infile_ to whatever I want, and this has stopped my issues.
Thank you @Tom !
I did include some links on reading Json files in your first thread.
They were sessions with papers presented on forum events.
The 32k limit was I already mentioned of a string variable.
With sting handling you should be able to manage that.
Make a pointer to what is scanned and shorten.trim that part.
I understand what TOM means. but you also can define this file as a stream file by RECFM=n . and sometime you need to consider to handle JASON object.but for your example. it seems that it is easy to get.
filename x 'c:\temp\test.json'; data want; infile x lrecl=3456677 dsd; input @'"ID":' id : $40. @'"startTime":' startTime @'"endTime":' endTime @'"name":' name :$40. @'"content":' content :$40. @@; run; data want1; infile x recfm=n dlm='{[",:'; input x : $100. @@; run;
Xia Keshan
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.