Hello
Is anyone able to provide some advice on data transformation possibilities and limitations within a customised JSON MAP? I have found some blog posts on amending the output in terms of joining data sets but not a lot on data transformation.
Some examples are listed below:
Thank you.
Hi @RB1Kenobi
A JSON Map works like a XML Map. A JSON structure can be split into several SAS Data sets, and variables can be read with informats, so e.g. string dates kan be read as SAS dates. Labels can also be added to the variables i a map. Other types of data manipulation, like contatenation and assigning default values, are not possible, but in my opinion it is preferable to do that in a seperate data step even if it was possible with the map.
As fas as I know SAS hasn't provided a tool like the XML mapper, so it is hard work in a text editor. The documentation for the JSON libname engine is not very detailed, but gives the overall picture.
Here is a paper that contains sample code that I have used as a starting point to play with map creation.
https://www.sas.com/content/dam/SAS/support/en/sas-global-forum-proceedings/2018/1734-2018.pdf
Good luck
Erik
Hi @RB1Kenobi
You don't get any data with the map, just a libname that contains views as defined in the map. So it takes a SAS step (data step or proc sql) to read data from the JSON file through the view and get it into a SAS data set. All processing is done in the step that reads data, including the processing defined in the map, so it will be the same work for the step.
I am aware that this doesn't seem very clear, so please write again if you need a better explanation.
Wow, this discussion comes on the PERFECT day for me, as I just started playing with JSON engine yesterday.... : )
@ErikLund_Jensen from the notes in the log (and confirmed by testing) , the LIBNAME statement is actually reading all the data from the JSON file when the libname statement executes. But I can't tell at all where it is storing that data (in memory perhaps, I don't see anything added to WORK library) or what format (definitely it's not a SAS datasets). Any idea as to where the data is stored and how? I'm wondering if I'm likely to run into memory problems if I try to read a huge JSON file (guess I'll start simulating some big files next).
Below is my testing on Windows 9.4M4
If I make a JSON file and assign a library (stealing code from Tom in https://communities.sas.com/t5/SAS-Enterprise-Guide/Parsing-a-JSON-file-in-SAS/td-p/531076😞
filename myjson temp; data _null_ ; file myjson ; put '[' ; put '{"firstName": "John", "lastName": "Smith", "age": 25}'; put ',' ; put '{"firstName": "Joho", "lastName": "Smiti", "age": 26}'; put ']'; run; libname myj JSON fileref=myjson;
I can see the tables listed in dictionary.members, with member type=DATA :
proc sql ; select libname,memname,memtype,engine from dictionary.members where libname='MYJ'; quit ;
Library Member Engine Name Member Name Type Name MYJ ALLDATA DATA JSON MYJ ROOT DATA JSON
I don't see anything listed in dictionary.views or dictionary.tables:
*Both these queries return 0 obs;
proc sql ;
select * from dictionary.views
where libname='MYJ';
quit ;
proc sql ;
select * from dictionary.tables
where libname='MYJ' ;
quit ;
But here's a surprise: if I only select a few columns from dictionary.tables, it returns the rows:
proc sql ;
select libname,memname,memtype
from dictionary.tables
where libname='MYJ' ;
quit ;
Library Member
Name Member Name Type
MYJ ALLDATA DATA
MYJ ROOT DATA
Not sure what that could be, but feels like a bug. I've never really understood the magic behind these dictionary tables. But looks like they must be dynamically generated (?) and data in dictionary.tables is incomplete for these tables, maybe.
I doubt that it is converting the JSON into SAS datasets when the LIBNAME runs. I would think that it just generates the logic it needs to convert the JSON into datasets. It appears to store that it memory. It does not appear to be writing anything to disk. But it will not re-analyze the file unless you re-run the LIBNAME statement. So if you change the actual JSON file after running the LIBNAME statement then when you try to extract/read the SAS datasets from the libref you could get gibberish results.
You could test some of these assumptions. Try giving it a JSON file that would be too large for the amount of memory your system. See if you can change the JSON file after you run the LIBNAME statement and test if you get a different output dataset.
Thanks Tom,
Indeed I tried the file JSON file change after running libname, and steps happily return the results from the JSON file read when the libname executed. That's what convinced me that the log is telling the truth, the libname statement really is reading in all the data in the JSON file, and storing it somehow somewhere, likely in memory since it doesn't seem to appear anywhere else.
605 filename myjson temp; 606 data _null_; 607 file myjson ; 608 put '[' ; 609 put '{"firstName": "John"}'; 610 put ']'; 611 run; NOTE: The file MYJSON is: Filename=C:\Users\\AppData\Local\Temp\SAS Temporary Files\_TD17216_MD1QCFVC_\#LN00100, RECFM=V,LRECL=32767,File Size (bytes)=0, Last Modified=30Apr2020:11:17:48, Create Time=30Apr2020:11:17:48 NOTE: 3 records were written to the file MYJSON. The minimum record length was 1. The maximum record length was 21. 612 613 libname myj JSON fileref=myjson; NOTE: JSON data is only read once. To read the JSON again, reassign the JSON LIBNAME. NOTE: Libref MYJ was successfully assigned as follows: Engine: JSON Physical Name: C:\Users\\AppData\Local\Temp\SAS Temporary Files\_TD17216_MD1QCFVC_\#LN00100 614 615 *overwrite JSON file ; 616 data _null_; 617 file myjson ; 618 put '[' ; 619 put '{"firstName": "Yoko"}'; 620 put ']'; 621 run; NOTE: The file MYJSON is: Filename=C:\Users\\AppData\Local\Temp\SAS Temporary Files\_TD17216_MD1QCFVC_\#LN00100, RECFM=V,LRECL=32767,File Size (bytes)=0, Last Modified=30Apr2020:11:17:48, Create Time=30Apr2020:11:17:48 NOTE: 3 records were written to the file MYJSON. The minimum record length was 1. The maximum record length was 21. 622 623 data _null_ ; 624 set myj.root ; 625 put "before rerunning libname " firstname= ; 626 run ; before rerunning libname firstName=John NOTE: There were 1 observations read from the data set MYJ.ROOT. 627 628 *Rerun libname ; 629 libname myj JSON fileref=myjson; NOTE: JSON data is only read once. To read the JSON again, reassign the JSON LIBNAME. NOTE: Libref MYJ was successfully assigned as follows: Engine: JSON Physical Name: C:\Users\Z0032BEV\AppData\Local\Temp\SAS Temporary Files\_TD17216_MD1QCFVC_\#LN00100 630 631 data _null_ ; 632 set myj.root ; 633 put "after rerunning libname " firstname= ; 634 run ; after rerunning libname firstName=Yoko NOTE: There were 1 observations read from the data set MYJ.ROOT.
Hi @RB1Kenobi , @Tom and @Quentin
To my big surprise data is indeed read and stored somewhere when the libname is assigned.
I made a file c:\temp\test.json and assigned it to a libname:
libname myjson JSON fileref=myjson alldata="mytest";
The libname is assigned with the following content and properties:
next, I deleted the physical file c:\temp\test.json.
After deleting, the libname shows the same content and properties, which is not surprising, a BASE libname would work the same way.
The surprise comes when data is copied from the libname to work, because it still works:
15 proc copy inlib=myjson outlib=work;
NOTE: Writing HTML Body file: sashtml.htm
16 run;
NOTE: Copying MYJSON.MYTEST to WORK.MYTEST (memtype=DATA).
NOTE: BUFSIZE is not cloned when copying across different engines. System Option for BUFSIZE was used.
NOTE: There were 6 observations read from the data set MYJSON.MYTEST.
NOTE: The data set WORK.MYTEST has 6 observations and 4 variables.
NOTE: Copying MYJSON.ROOT to WORK.ROOT (memtype=DATA).
NOTE: BUFSIZE is not cloned when copying across different engines. System Option for BUFSIZE was used.
NOTE: There were 2 observations read from the data set MYJSON.ROOT.
NOTE: The data set WORK.ROOT has 2 observations and 4 variables.
NOTE: PROCEDURE COPY used (Total process time):
real time 1.06 seconds
cpu time 0.28 seconds
So data must be in a temp store somewhere. When working with very big files, this might be a problem with unexplained space usage, so it would be nice if SAS Institute would give a technical explaination of the Json libname.
I made a fairly big JSON file (~1M records) and watched my laptop's memory usage as I assigned the JSON libref and cleared it. I'm not a performance guy, but looked pretty obvious that the libname statement is storing something big in memory. Knowing that, probably makes sense to clear JSON librefs after you're doing reading data (if you're reading big files).
Test code:
filename mkjson temp ;
data bigclass ;
set sashelp.class ;
do i=1 to 5e4 ;
output ;
end ;
run ;
proc json out=mkjson;
export bigclass;
run;
*start windows resource with command: resmon ;
*watch how memory is used when assing libref, and freed when you clear it.;
libname myj JSON fileref=mkjson;
*release memory ;
libname myj clear ;
Hi, thanks for your contribution we're learning a lot here!
In terms of data transformation it is possible to edit the MAP to make a sting date into a SAS Date.
And as the MAP is a SAS construct it seems to accept the /* COMMENT OUT*/ syntax which is great for leaving notes for future reference too!
{ "NAME": "ConvictionDate", /* "TYPE": "CHARACTER", */ "TYPE": "NUMERIC", "INFORMAT": [ "IS8601DT", 19, 0 ], "FORMAT": ["DATETIME", 20], "PATH": "/root/transactions/Drivers/Convictions/ConvictionDate", /* "CURRENT_LENGTH": 24 */ "CURRENT_LENGTH": 8 },
So the MAP file that the JSON libname engine generates/uses is just another JSON object.
So you can make a libref that points to that and read it into SAS dataset(s).
filename myjson temp;
data _null_ ;
file myjson ;
put '[' ;
put '{"firstName": "John", "lastName": "Smith", "age": 25}';
put ',' ;
put '{"firstName": "Joho", "lastName": "Smiti", "age": 26}';
put ']';
run;
filename mymap temp;
libname myj JSON map=mymap automap=reuse fileref=myjson access=readonly;
libname myjmap json fileref=mymap access=readonly;
data _null_;
set myjmap.datasets;
put (_n_ _all_) (=);
run;
data _null_;
set myjmap.datasets_variables;
put (_n_ _all_) (=);
run;
Results:
NOTE: The infile MYMAP is: ... { "DATASETS": [ { "DSNAME": "root", "TABLEPATH": "/root", "VARIABLES": [ { "NAME": "ordinal_root", "TYPE": "ORDINAL", "PATH": "/root" }, { "NAME": "firstName", "TYPE": "CHARACTER", "PATH": "/root/firstName", "CURRENT_LENGTH": 4 }, { "NAME": "lastName", "TYPE": "CHARACTER", "PATH": "/root/lastName", "CURRENT_LENGTH": 5 }, { "NAME": "age", "TYPE": "NUMERIC", "PATH": "/root/age" } ] } ] } NOTE: 32 records were read from the infile MYMAP. The minimum record length was 1. The maximum record length was 36. ... 281 data _null_; 282 set myjmap.datasets_variables; 283 put (_n_ _all_) (=); 284 run; _N_=1 ordinal_DATASETS=1 ordinal_VARIABLES=1 NAME=ordinal_root TYPE=ORDINAL PATH=/root CURRENT_LENGTH=. _N_=2 ordinal_DATASETS=1 ordinal_VARIABLES=2 NAME=firstName TYPE=CHARACTER PATH=/root/firstName CURRENT_LENGTH=4 _N_=3 ordinal_DATASETS=1 ordinal_VARIABLES=3 NAME=lastName TYPE=CHARACTER PATH=/root/lastName CURRENT_LENGTH=5 _N_=4 ordinal_DATASETS=1 ordinal_VARIABLES=4 NAME=age TYPE=NUMERIC PATH=/root/age CURRENT_LENGTH=. NOTE: There were 4 observations read from the data set MYJMAP.DATASETS_VARIABLES. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds
Does anyone have a program that can convert a dataset in the DATASETS_VARIABLES format back into the MAP file that the JSON engine needs? If so then you can use a SAS program to read the map file, analyze it, modify it, write it and then use it to re-read the JSON file.
In the example code how are you able to fetch both the names; JSmith and MAnderson using single path in the json;"PATH": "/root/info/name"
When I try to access multiple values with same path I get the output of only one value. In this case the output will be only MAnderson
Please help.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.