Folks,
I have a large json file which I need to eventually read into SAS. I built a loading program which carried this out for 500k records.
However, the json file I received yesterday has circa 6.7million records.
Therefore, I would like to split this json file out into n smaller datasets (as the number of records will change from delivery to delivery I will need my code to be flexible when it is spillting out files.
Attach is a sample piece of code which edits the current json file, but basically what I would like to do is read in the file and create a new json file for every n records (lets say 500k).
Any help would be most welcome.
Kind regards,
Sean
data _null_;
infile json2 end=eof; * json2 refers to file I'm reading in
file json3 ; * file i'm spitting out and the bits and pieces I'm doing below
input ;
if _n_=1 then put '[' @; else put ',' @;
put _infile_;
if eof then put ']';
run;
Use the FILEVAR= option to use dynamic filenames. Use a DO loop to control how many lines per output file.
%let target=/folder/basename;
%let n=500;
data _null_;
if eof then stop;
length filename $200;
filename=cats("&target",_n_,'.json');
file out filevar=filename ;
sep = '[' ;
do i=1 to &n until (eof);
infile json2 end=eof;
input ;
put sep $1. _infile_;
sep=',';
end;
put ']';
run;
The IF statement is to prevent an extra empty file. Move to the end of the data step if you want to make one empty file if the source file is empty.
Note both this version and your original program will only work if the source file is a JSON Lines file. Link: http://jsonlines.org/
The FILEVAR option allows you to change the output file during data step iteration.
....but: Given your current code you can't just write to a new output file solely based on _n_ as this will most likely create invalid Json files. You will need to add some additional logic to your code to "cut" the source Json at a valid place and then also create the correct "header" and "footer" for the new Json file.
Use the FILEVAR= option to use dynamic filenames. Use a DO loop to control how many lines per output file.
%let target=/folder/basename;
%let n=500;
data _null_;
if eof then stop;
length filename $200;
filename=cats("&target",_n_,'.json');
file out filevar=filename ;
sep = '[' ;
do i=1 to &n until (eof);
infile json2 end=eof;
input ;
put sep $1. _infile_;
sep=',';
end;
put ']';
run;
The IF statement is to prevent an extra empty file. Move to the end of the data step if you want to make one empty file if the source file is empty.
Note both this version and your original program will only work if the source file is a JSON Lines file. Link: http://jsonlines.org/
Hi,
I'm running into problems here as one of the json records when read into SAS exceeds the maximum number of column bytes. It cuts off after32,767 so when the json engine does it think it fails at the following line as it is no longer a valid json.
Can anyone suggest a work around?
Try this.
Let's make a test file, content doesn't matter.
filename json2 temp;
data _null_;
file json2 ;
put 'line one' / 'line two' / 'line three' ;
run;
Now let's split it into max of 2 records per output file.
%let n=2;
%let target=c:\downloads\json_split;
data _null_;
if eof then stop;
length filename $200;
filename=cats("&target",_n_,'.json');
file out filevar=filename recfm=n;
sep = '[' ;
do i=1 to &n until (eof);
infile json2 recfm=f lrecl=1 end=eof;
put sep $1. @;
do until(char='0A'x or eof);
input char $char1. ;
put char $char1. @;
end;
sep=',';
end;
put ']';
run;
Then let's check the results.
data _null_;
infile "&target*";
input;
list;
run;
486 data _null_; 487 infile "&target*"; 488 input; 489 list; 490 run; NOTE: The infile "c:\downloads\json_split*" is: Filename=c:\downloads\json_split1.json, File List=c:\downloads\json_split*,RECFM=V, LRECL=32767 RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+-- 1 [line one 9 2 ,line two 9 3 ] 1 NOTE: The infile "c:\downloads\json_split*" is: Filename=c:\downloads\json_split2.json, File List=c:\downloads\json_split*,RECFM=V, LRECL=32767 4 [line three 11 5 ] 1 NOTE: 3 records were read from the infile "c:\downloads\json_split*". The minimum record length was 1. The maximum record length was 9. NOTE: 2 records were read from the infile "c:\downloads\json_split*". The minimum record length was 1. The maximum record length was 11. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Check out this tutorial series to learn how to build your own steps in SAS Studio.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.