I am trying to export a SAS data set as a .JSON file. The data set has many empty cells, and cells that are zero's. In an attempt to reduce the size of the .JSON file (it is presently more than 1GB in size), I wish to avoid printing these values. However, it does not seem like PROC JSON has this feature included. Therefore, can anyone help me with a workaround?
Thanks in advance for any tips or help!
Can you give us some sample data - that'll make it easier to give you a comprehensive response?
Hi Chris
Sorry for not adding any programs or datasets. I created a small dataset that hopefully illustrates the issue. Let's say I have a dataset defined as follows
data testdata;
input class $ var1 var2 var3;
datalines;
class1 3 0 1
class2 2 2 .
class3 1 1 1
;
I have a number of class variables, each with a set of (in this case numeric) values. Now, I wish to export the data as a .JSON file that is to be read elsewhere, where it doesn't matter if a given value is empty, 0 or missing. Therefore, I wish to remove these values from the exported .JSON file. Using a normal proc json,
proc json out = 'Y:\jsonout.json' pretty nosastags ;
export testdata;
run;
I get the following:
[
{
"class": "class1",
"var1": 3,
"var2": 0,
"var3": 1
},
{
"class": "class2",
"var1": 2,
"var2": 2,
"var3": null
},
{
"class": "class3",
"var1": 1,
"var2": 1,
"var3": 1
}
]
So basically, I wish to create a .JSON file that does not include "var2" for "class1" and "var3" for "class2", but every other variable. What I want is the following:
[
{
"class": "class1",
"var1": 3,
"var3": 1
},
{
"class": "class2",
"var1": 2,
"var2": 2,
},
{
"class": "class3",
"var1": 1,
"var2": 1,
"var3": 1
}
]
I hope this example helps.
I think you'll have to post-process the JSON file that you create. It's simple with DATA step. Read the file lines, and output only those that have a value other than 0 or null.
data testdata;
input class $ var1 var2 var3;
datalines;
class1 3 0 1
class2 2 2 .
class3 1 1 1
;
proc json out = 'c:\temp\jsonout.json' pretty nosastags ;
export testdata;
run;
data _null_;
infile 'c:\temp\jsonout.json';
file 'c:\temp\jsonout_light.json';
input;
length val $ 10;
/* get value token from the JSON line */
val = compress( scan(_infile_,-1,':'), ',','s');
if (val ^= '0' and val ^= 'null') then
put _infile_;
run;
The one problem with this (well, maybe there are more problems that I haven't thought of) -- this could leave some lines with a terminating comma before the end brace, when JSON convention is to leave the comma off for the last value in a list. So you'll probably need to add in some more conditions with a RETAIN to strip the comma if the next token is an end brace.
Hi Chris
Thank you for your answer. I will try to experiment a bit with your scan solution. Do you know if any plans for implementing such a feature directly to proc JSON is on its way?
Since you are worried about the size of the file, another possibility is to turn off the field keys. This will probably cut the size of the file by at least 50%. Without the keys, the file is slightly harder for a human to read. I do not know how the JSON file is to be used. But, if the JSON file is going to be used as input to another computer program are the keys really needed? The first value in the array is always the "class" variable, the second variable, "var1", etc. Note that without keys, the JSON containers holding the data change from objects to arrays.
proc json out='smallJson.json' pretty nosastags nokeys;
export testdata;
run;
[
[
"class1",
3,
0,
1
],
[
"class2",
2,
2,
null
],
[
"class3",
1,
1,
1
]
]
Hi BIIM_SAS
Thank you for your answer!
While your solution does decreases the size of my file significantly (it just about cuts the size in half), I'm afraid your solution isn't viable as I see it. Yes, the JSON file is to be read in a different program, but the data set has many columns (right now it is 168 columns), and some are added and removed continuously as we still are in a development phase. Therefore, simply removing the keys from the JSON file is not something I am comfortable with.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.