Hi Reeza,
Wouldn't the JSON libname convert individual files and then append? In that case the main problem of varying variable length will exist.
I am struggling to run your other code which is giving me many error messages.
@BayzidurRahman wrote:
Hi Reeza,
Wouldn't the JSON libname convert individual files and then append? In that case the main problem of varying variable length will exist.
I am struggling to run your other code which is giving me many error messages.
You need to construct a MAP file that generates the right data structure and use it with every JSON engine libref you create. Then the individual datasets will be compatible.
A sample code would be very helpful.
So if you have a simple SAS dataset that list every variable and has the other fields needed to make a map file, like this for example:
DSNAME is the name you want for the dataset. TABLEPATH is how to find where to start a new table. For you simple single table JSON files it should always just be /root.
NAME is the variable name.
TYPE is the variable type. You might not need that first ORDINAL variable that automap always makes. You only need a LENGTH for the character variables. For the character variables set it to maximum value you should have for that field in any of the JSON files.
PATH is the where to find that variable in the JSON. For you simple files it should just be /root/variable name. Note that the PATH and TABLEPATH are values are case sensitive.
Then you can use it to write out a map file using a data step like this.
filename map 'Myfiles.map';
data _null_;
file map ;
if _n_=1 then put '{"DATASETS":' / '[{' @ ;
if eof then put '}]'/ '}' ;
set variables end=eof;
by dsname ;
if first.dsname then do;
if _n_ > 1 then put ',' @;
put '"DSNAME":' dsname :$quote. ',"TABLEPATH":' tablepath :$quote.
/ ' ,"VARIABLES":'
/ ' [' @
;
end;
else put ' ,' @;
put '{"NAME":' name :$quote. ',"TYPE":' type :$quote. ',"PATH":' path :$quote. @ ;
if length then put ',"LENGTH":' length @;
put '}' ;
if last.dsname then put ' ]' ;
run;
So once you have that MAP file created and the MAP fileref defined to point at it you can use it with the JSON libref to point to various JSON files.
Example:
filename json 'myfile1.json';
libname json json fileref=json map=map;
proc append base=all_files data=json.root force;
run;
filename json 'myfile2.json';
libname json json fileref=json map=map;
proc append base=all_files data=json.root force;
run;
...
I found the following error message
1856 filename map 'Myfiles.map';
1857 data _null_;
1858 file map ;
1859 if _n_=1 then put '{"DATASETS":' / '[{' @ ;
1860 if eof then put '}]'/ '}' ;
1861
1862 set variables end=eof;
ERROR: File WORK.VARIABLES.DATA does not exist.
1863 by dsname ;
1864 if first.dsname then do;
1865 if _n_ > 1 then put ',' @;
1866 put '"DSNAME":' dsname :$quote. ',"TABLEPATH":' tablepath :$quote.
1867 / ' ,"VARIABLES":'
1868 / ' [' @
1869 ;
1870 end;
1871 else put ' ,' @;
1872 put '{"NAME":' name :$quote. ',"TYPE":' type :$quote. ',"PATH":' path :$quote. @ ;
1873 if length then put ',"LENGTH":' length @;
1874 put '}' ;
1875 if last.dsname then put ' ]' ;
1876 run;
NOTE: The SAS System stopped processing this step because of errors.
NOTE: DATA statement used (Total process time):
real time 0.03 seconds
cpu time 0.03 seconds
You cannot use a dataset named VARIABLES if you have not created it.
To use a dataset to drive the creation of the MAP file you need to create the dataset first.
So, how should I modify the code?
Create the dataset I described before (and that the data step you tried to run expected as input) with the NAME, TYPE, LENGTH and PATH for the variables you want to read from the JSON files.
If you don't know what the names of the variables are then try reading one of the JSON files and running PROC CONTENTS on it. But I would recommend then making your own guesses about what type and length to use to define the variables. Because that is the trouble with using the JSON engine and letting it just GUESS how to to define the variables. When it is only reading one of the many example files it can only use the data in that file to base its guesses on how to define the variables. If the variable happens to be empty or happens to only include digit strings then it might make the variable with the wrong type. If the example file you are reading only has short strings for one of the variables it might define the variable as too short.
By taking the time to build your own map file you can get control over how the variables are defined by the JSON engine so that each file being read will result in a dataset with the same variables defined as the same types and same storage lengths.
If you have no idea what to use then make every variable as character with some long length (say 200 or 400 bytes). Then read all of the json files and check the maximum length of each variable. Or check if the all of the non-empty values could be converted into numbers then perhaps you can define the variable as a number.
And I posted a response to your error messages, try adding the filelockwait option to the libname statement so it waits for access.
https://support.sas.com/resources/papers/proceedings18/1734-2018.pdf
I am not sure where I am making the mistake in the following code. The error message is attached.
filename dirtree url
'https://raw.githubusercontent.com/sasutils/macros/master/dirtree.sas';
%include dirtree /source2;
%dirtree(C:\Users\bayzid\Macquarie University\MRFF Aged care Data - Documents\json_test /* Pipe delimited directory list (default=.) */
, out=filelist /* Output dataset name */
, maxdepth=1 /* Maximum tree depth */);
%macro read_json(myfile=);
filename mydata &myfile.;
libname myjson JSON fileref=mydata filelockwait=600;;
proc datasets lib=myjson;
quit;
*how to parse the file has not been specified - looks like you need to specify the lengths;
data file2save;
length source $200. p1 $256. Value $256.;
set myjson.alldata;
source=&myfile.;
run;
proc append base=master data=file2save force;
run;
proc sql;
drop table file2save;
quit;
filename mydata;
libname myjson;
%mend read_json;
proc sql;
drop table master;
quit;
data run_list;
set filelist;
where type='F';
str=catt('%read_json(myfile="C:\Users\bayzid\Macquarie University\MRFF Aged care Data - Documents\json_test', filename, '");');
call execute(str);
run;
Actually that was just a note. Your error is because you forgot a slash at the end of your path.
ERROR: Physical file does not exist, C:\Users\mq10004085\Macquarie University\MRFF Aged care
Data - Documents\json_test1\MyDrug_e64e0687-7c55-423e-acf0-e0b64387db89.json.
Note your previous log used json_test1 but your code below referencens json_test.
filename dirtree url
'https://raw.githubusercontent.com/sasutils/macros/master/dirtree.sas';
%include dirtree /source2;
%dirtree(C:\Users\bayzid\Macquarie University\MRFF Aged care Data - Documents\json_test /* Pipe delimited directory list (default=.) */
, out=filelist /* Output dataset name */
, maxdepth=1 /* Maximum tree depth */);
%macro read_json(myfile=);
filename mydata &myfile.;
libname myjson JSON fileref=mydata nrm;;
proc datasets lib=myjson;
quit;
*how to parse the file has not been specified - looks like you need to specify the lengths;
data file2save;
length source $200. p1 $256. Value $256.;
set myjson.alldata;
source=&myfile.;
run;
proc append base=master data=file2save force;
run;
proc sql;
drop table file2save;
quit;
filename mydata;
libname myjson;
%mend read_json;
proc sql;
drop table master;
quit;
data run_list;
set filelist;
where type='F';
str=catt('%read_json(myfile="C:\Users\bayzid\Macquarie University\MRFF Aged care Data - Documents\json_test\', filename, '");');
call execute(str);
run;
Thanks Reez. It seems to work. I have two issues: 1. how to add wildcard with JSON file name
2. How to transpose the master dataset into wide format.
Why a wildcard? The %dirtree should generate the full list of JSON files and then the data step calls the macro to read each file. To transpose use PROC TRANSPOSE.
I have ~84,000 JSON files that needs to be appended into 23 tables. All the files are named with an index after the group name. I need to run a loop over the group and use the group name in the wildcard.
I am struggling to figure out correct id variables to transpose the "master" data file.
@bayzid wrote:
I have ~84,000 JSON files that needs to be appended into 23 tables. All the files are named with an index after the group name. I need to run a loop over the group and use the group name in the wildcard.
The %dirtree generates the list of the files in the folder and reads all the files in. Why do you need to manually control the file read?
I am struggling to figure out correct id variables to transpose the "master" data file.
Unfortunately you do need to understand your data to work with it, there's typically no workaround for that but to explore the data or read any documentation.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.