BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Sean_OConnor
Obsidian | Level 7

Folks,

 

I have a large json file which I need to eventually read into SAS. I built a loading program which carried this out for 500k records.

 

However, the json file I received yesterday has circa 6.7million records. 

 

Therefore, I would like to split this json file out into n smaller datasets (as the number of records will change from delivery to delivery I will need my code to be flexible when it is spillting out files.

 

Attach is a sample piece of code which edits the current json file, but basically what I would like to do is read in the file and create a new json file for every n records (lets say 500k).

 

Any help would be most welcome.

 

Kind regards,

 

Sean 

 

data _null_;
  infile json2 end=eof; * json2 refers to file I'm reading in
  file json3 ; * file i'm spitting out and the bits and pieces I'm doing below 
  input ;
  if _n_=1 then put '[' @; else put ',' @;
  put _infile_;
  if eof then put ']';
run;
1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Use the FILEVAR= option to use dynamic filenames. Use a DO loop to control how many lines per output file.

%let target=/folder/basename;
%let n=500;

data _null_;
  if eof then stop;
  length filename $200;
  filename=cats("&target",_n_,'.json');
  file out filevar=filename ;
  sep = '[' ;
  do i=1 to &n until (eof);
    infile json2 end=eof; 
    input ;
    put sep $1. _infile_;
    sep=',';
  end;
  put ']';
run;

The IF statement is to prevent an extra empty file. Move to the end of the data step if you want to make one empty file if the source file is empty.

Note both this version and your original program will only work if the source file is a JSON Lines file. Link: http://jsonlines.org/ 

View solution in original post

6 REPLIES 6
Patrick
Opal | Level 21

@Sean_OConnor 

The FILEVAR option allows you to change the output file during data step iteration.

https://go.documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.4&docsetId=lestmtsref&docsetTarge... 

 

....but: Given your current code you can't just write to a new output file solely based on _n_ as this will most likely create invalid Json files. You will need to add some additional logic to your code to "cut" the source Json at a valid place and then also create the correct "header" and "footer" for the new Json file.

Tom
Super User Tom
Super User

Use the FILEVAR= option to use dynamic filenames. Use a DO loop to control how many lines per output file.

%let target=/folder/basename;
%let n=500;

data _null_;
  if eof then stop;
  length filename $200;
  filename=cats("&target",_n_,'.json');
  file out filevar=filename ;
  sep = '[' ;
  do i=1 to &n until (eof);
    infile json2 end=eof; 
    input ;
    put sep $1. _infile_;
    sep=',';
  end;
  put ']';
run;

The IF statement is to prevent an extra empty file. Move to the end of the data step if you want to make one empty file if the source file is empty.

Note both this version and your original program will only work if the source file is a JSON Lines file. Link: http://jsonlines.org/ 

Sean_OConnor
Obsidian | Level 7

Hi,

 

I'm running into problems here as one of the json records when read into SAS exceeds the maximum number of column bytes. It cuts off after32,767 so when the json engine does it think it fails at the following line as it is no longer a valid json.

 

Can anyone suggest a work around?

Tom
Super User Tom
Super User
read/write the file byte by byte (use RECFM=N on INFILE) when you see a LF ('0A'x) then write ']' '0A'x ',['.
Sean_OConnor
Obsidian | Level 7
Hi Tom,

Could expand on your second piece, please?
Tom
Super User Tom
Super User

Try this.

Let's make a test file, content doesn't matter.

filename json2 temp;
data _null_;
  file json2 ;
  put 'line one' / 'line two' / 'line three' ;
run;

Now let's split it into max of 2 records per output file.


%let n=2;
%let target=c:\downloads\json_split;
data _null_;
  if eof then stop;
  length filename $200;
  filename=cats("&target",_n_,'.json');
  file out filevar=filename recfm=n;
  sep = '[' ;
  do i=1 to &n until (eof);
    infile json2 recfm=f lrecl=1 end=eof; 
    put sep $1. @;
    do until(char='0A'x or eof);
      input char $char1. ;
      put char $char1. @;
    end;
    sep=',';
  end;
  put ']';
run;

Then let's check the results.

data _null_;
  infile "&target*";
  input;
  list;
run;
486   data _null_;
487     infile "&target*";
488     input;
489     list;
490   run;

NOTE: The infile "c:\downloads\json_split*" is:
      Filename=c:\downloads\json_split1.json,
      File List=c:\downloads\json_split*,RECFM=V,
      LRECL=32767

RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+--
1         [line one 9
2         ,line two 9
3         ] 1
NOTE: The infile "c:\downloads\json_split*" is:
      Filename=c:\downloads\json_split2.json,
      File List=c:\downloads\json_split*,RECFM=V,
      LRECL=32767

4         [line three 11
5         ] 1
NOTE: 3 records were read from the infile "c:\downloads\json_split*".
      The minimum record length was 1.
      The maximum record length was 9.
NOTE: 2 records were read from the infile "c:\downloads\json_split*".
      The minimum record length was 1.
      The maximum record length was 11.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

Creating Custom Steps in SAS Studio

Check out this tutorial series to learn how to build your own steps in SAS Studio.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 2322 views
  • 0 likes
  • 3 in conversation