BookmarkSubscribeRSS Feed
RB1Kenobi
Quartz | Level 8

Hello

Is anyone able to provide some advice on data transformation possibilities and limitations within a customised JSON MAP?   I have found some blog posts on amending the output in terms of joining data sets but not a lot on data transformation.

 

Some examples are listed below:

  • Transform a string date into a SAS date variable
  • Trim / strip blanks
  • Join variables into a single variable (or split a variable into two e.g. DATETIME into Date + Time)

 

Thank you.

13 REPLIES 13
ErikLund_Jensen
Rhodochrosite | Level 12

Hi @RB1Kenobi 

 

A JSON Map works like a XML Map. A JSON structure can be split into several SAS Data sets, and variables can be read with informats, so e.g. string dates kan be read as SAS dates. Labels can also be added to the variables i a map. Other types of data manipulation, like contatenation and assigning default values, are not possible, but in my opinion it is preferable to do that in a seperate data step even if it was possible with the map.

 

As fas as I know SAS hasn't provided a tool like the XML mapper, so it is hard work in a text editor. The documentation for the JSON libname engine is not very detailed, but gives  the overall picture. 

https://documentation.sas.com/?docsetId=lestmtsglobal&docsetTarget=n1jfdetszx99ban1rl4zll6tej7j.htm&...

 

Here is a paper that contains sample code that I have used as a starting point to play with map creation.

https://www.sas.com/content/dam/SAS/support/en/sas-global-forum-proceedings/2018/1734-2018.pdf

 

Good luck

Erik

 

RB1Kenobi
Quartz | Level 8
Thank you, I hadn't come across that paper. For an ongoing hourly load of JSON data I though it may be more efficient to do as much as possible within the MAP rather than subsequent programming.
ErikLund_Jensen
Rhodochrosite | Level 12

Hi @RB1Kenobi 

 

You don't get any data with the map, just a libname that contains views as defined in the map. So it takes a SAS step (data step or proc sql) to read data from the JSON file through the view and get it into a SAS data set. All processing is done in the step that reads data, including the processing defined in the map, so it will be the same work for the step. 

 

I am aware that this doesn't seem very clear, so please write again if you need a better explanation.

 

 

RB1Kenobi
Quartz | Level 8
I understand. This made sense after I searched for the "tables" (i.e. the views) in the VMEMBERS and dictionary tables but didn't find them.
Thanks for your input.
Quentin
Super User

Wow, this discussion comes on the PERFECT day for me, as I just started playing with JSON engine yesterday.... : )

 

@ErikLund_Jensen from the notes in the log (and confirmed by testing) , the LIBNAME statement is actually reading all the data from the JSON file when the libname statement executes.  But I can't tell at all where it is storing that data (in memory perhaps, I don't see anything added to WORK library) or what format (definitely it's not a SAS datasets).  Any idea as to where the data is stored and how?  I'm wondering if I'm likely to run into memory problems if I try to read a huge JSON file (guess I'll start simulating some big files next).

 

Below is my testing on Windows 9.4M4

 

If I make a JSON file and assign a library (stealing code from Tom in https://communities.sas.com/t5/SAS-Enterprise-Guide/Parsing-a-JSON-file-in-SAS/td-p/531076😞

 

filename myjson temp;
data _null_ ;
  file myjson ;
  put '[' ;
  put  '{"firstName": "John", "lastName": "Smith", "age": 25}';
  put  ',' ;
  put  '{"firstName": "Joho", "lastName": "Smiti", "age": 26}';
  put ']';
run;
libname myj JSON fileref=myjson;

 

 

I can see the tables listed in dictionary.members, with member type=DATA :

 

proc sql ;
  select libname,memname,memtype,engine from dictionary.members
  where libname='MYJ';
quit ;
Library Member Engine Name Member Name Type Name MYJ ALLDATA DATA JSON MYJ ROOT DATA JSON

 

I don't see anything listed in dictionary.views or dictionary.tables:

 

 

*Both these queries return 0 obs;

proc sql ;
  select * from dictionary.views
  where libname='MYJ';
quit ;

proc sql ;
  select * from dictionary.tables
  where libname='MYJ' ;
quit ;

But here's a surprise:  if I only select a few columns from dictionary.tables, it returns the rows:

 

 

proc sql ;
  select libname,memname,memtype 
  from dictionary.tables
  where libname='MYJ' ;
quit ;

Library                                     Member
Name      Member Name                       Type
MYJ       ALLDATA                           DATA
MYJ       ROOT                              DATA

Not sure what that could be, but feels like a bug.  I've never really understood the magic behind these dictionary tables.  But looks like they must be dynamically generated (?) and data in dictionary.tables is incomplete for these tables, maybe.

 

 

 

 

BASUG is hosting free webinars Next up: Don Henderson presenting on using hash functions (not hash tables!) to segment data on June 12. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
ErikLund_Jensen
Rhodochrosite | Level 12
Hi - thanks for joining the discussion. A leaking hot water pipe requires my attention right now, but I will return later.
Tom
Super User Tom
Super User

I doubt that it is converting the JSON into SAS datasets when the LIBNAME runs.  I would think that it just generates the logic it needs to convert the JSON into datasets.  It appears to store that it memory.  It does not appear to be writing anything to disk.  But it will not re-analyze the file unless you re-run the LIBNAME statement. So if you change the actual JSON file after running the LIBNAME statement then when you try to extract/read the SAS datasets from the libref you could get gibberish results.

 

You could test some of these assumptions.  Try giving it a JSON file that would be too large for the amount of memory your system. See if you can  change the JSON file after you run the LIBNAME statement and test if you get a different output dataset.

Quentin
Super User

Thanks Tom, 

Indeed I tried the file JSON file change after running libname, and steps happily return the results from the JSON file read when the libname executed.  That's what convinced me that the log is telling the truth, the libname statement really is reading in all the data in the JSON file, and storing it somehow somewhere, likely in memory since it doesn't seem to appear anywhere else.

 

605  filename myjson temp;
606  data _null_;
607    file myjson ;
608    put '[' ;
609    put  '{"firstName": "John"}';
610    put ']';
611  run;

NOTE: The file MYJSON is:

      Filename=C:\Users\\AppData\Local\Temp\SAS Temporary
      Files\_TD17216_MD1QCFVC_\#LN00100,
      RECFM=V,LRECL=32767,File Size (bytes)=0,
      Last Modified=30Apr2020:11:17:48,
      Create Time=30Apr2020:11:17:48

NOTE: 3 records were written to the file MYJSON.
      The minimum record length was 1.
      The maximum record length was 21.

612
613  libname myj JSON fileref=myjson;
NOTE: JSON data is only read once.  To read the JSON again, reassign the JSON LIBNAME.
NOTE: Libref MYJ was successfully assigned as follows:
      Engine:        JSON
      Physical Name: C:\Users\\AppData\Local\Temp\SAS Temporary
      Files\_TD17216_MD1QCFVC_\#LN00100
614
615  *overwrite JSON file ;
616  data _null_;
617    file myjson ;
618    put '[' ;
619    put  '{"firstName": "Yoko"}';
620    put ']';
621  run;

NOTE: The file MYJSON is:

      Filename=C:\Users\\AppData\Local\Temp\SAS Temporary
      Files\_TD17216_MD1QCFVC_\#LN00100,
      RECFM=V,LRECL=32767,File Size (bytes)=0,
      Last Modified=30Apr2020:11:17:48,
      Create Time=30Apr2020:11:17:48

NOTE: 3 records were written to the file MYJSON.
      The minimum record length was 1.
      The maximum record length was 21.

622
623  data _null_ ;
624   set myj.root ;
625   put "before rerunning libname " firstname= ;
626  run ;

before rerunning libname firstName=John
NOTE: There were 1 observations read from the data set MYJ.ROOT.

627
628  *Rerun libname ;
629  libname myj JSON fileref=myjson;
NOTE: JSON data is only read once.  To read the JSON again, reassign the JSON LIBNAME.
NOTE: Libref MYJ was successfully assigned as follows:
      Engine:        JSON
      Physical Name: C:\Users\Z0032BEV\AppData\Local\Temp\SAS Temporary
      Files\_TD17216_MD1QCFVC_\#LN00100
630
631  data _null_ ;
632   set myj.root ;
633   put "after rerunning libname " firstname= ;
634  run ;

after rerunning libname firstName=Yoko
NOTE: There were 1 observations read from the data set MYJ.ROOT.
BASUG is hosting free webinars Next up: Don Henderson presenting on using hash functions (not hash tables!) to segment data on June 12. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
ErikLund_Jensen
Rhodochrosite | Level 12

Hi @RB1Kenobi , @Tom  and @Quentin 

 

To my big surprise data is indeed read and stored somewhere when the libname is assigned.

 

I made a file c:\temp\test.json and assigned it to a libname:

 

libname myjson JSON fileref=myjson alldata="mytest";

 

The libname is assigned with the following content and properties:

 

jsontest.gif

 

 

 

 

 

 

 

 

 

 

 

 

next, I deleted the physical file c:\temp\test.json. 

After deleting, the libname shows the same content and properties, which is not surprising, a BASE libname would work the same way.

The surprise comes when data is copied from the libname to work, because it still works:

 

15 proc copy inlib=myjson outlib=work;
NOTE: Writing HTML Body file: sashtml.htm
16 run;

NOTE: Copying MYJSON.MYTEST to WORK.MYTEST (memtype=DATA).
NOTE: BUFSIZE is not cloned when copying across different engines. System Option for BUFSIZE was used.
NOTE: There were 6 observations read from the data set MYJSON.MYTEST.
NOTE: The data set WORK.MYTEST has 6 observations and 4 variables.
NOTE: Copying MYJSON.ROOT to WORK.ROOT (memtype=DATA).
NOTE: BUFSIZE is not cloned when copying across different engines. System Option for BUFSIZE was used.
NOTE: There were 2 observations read from the data set MYJSON.ROOT.
NOTE: The data set WORK.ROOT has 2 observations and 4 variables.
NOTE: PROCEDURE COPY used (Total process time):
real time 1.06 seconds
cpu time 0.28 seconds

So data must be in a temp store somewhere. When working with very big files, this might be a problem with unexplained space usage, so it would be nice if  SAS Institute would give a technical explaination of the Json libname. 

 

 

 

 

Quentin
Super User

I made a fairly big JSON file (~1M records) and watched my laptop's memory usage as I assigned the JSON libref and cleared it.  I'm not a performance guy, but looked pretty obvious that the libname statement is storing something big in memory.  Knowing that, probably makes sense to clear JSON librefs after you're doing reading data (if you're reading big files).

 

Test code:

filename mkjson temp ;

data bigclass ;
  set sashelp.class ;
  do i=1 to 5e4 ;
    output ;
  end ;
run ;

proc json out=mkjson;
   export bigclass;
run;

*start windows resource with command: resmon ;

*watch how memory is used when assing libref, and freed when you clear it.;

libname myj JSON fileref=mkjson;

*release memory ;
libname myj clear ;

 

BASUG is hosting free webinars Next up: Don Henderson presenting on using hash functions (not hash tables!) to segment data on June 12. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
RB1Kenobi
Quartz | Level 8

Hi, thanks for your contribution we're learning a lot here!   

In terms of data transformation it is possible to edit the MAP to make a sting date into a SAS Date.

C Hemedinger - editing MAP 

And as the MAP is a SAS construct it seems to accept the /* COMMENT OUT*/ syntax which is great for leaving notes for future reference too!

 

{
          "NAME": "ConvictionDate",
        /*  "TYPE": "CHARACTER", */
		  "TYPE": "NUMERIC",
		  "INFORMAT": [ "IS8601DT", 19, 0 ],
			"FORMAT": ["DATETIME", 20],
          "PATH": "/root/transactions/Drivers/Convictions/ConvictionDate",
        /*  "CURRENT_LENGTH": 24   */
		"CURRENT_LENGTH": 8
        },
Tom
Super User Tom
Super User

So the MAP file that the JSON libname engine generates/uses is just another JSON object.

So you can make a libref that points to that and read it into SAS dataset(s).

filename myjson temp;
data _null_ ;
  file myjson ;
  put '[' ;
  put  '{"firstName": "John", "lastName": "Smith", "age": 25}';
  put  ',' ;
  put  '{"firstName": "Joho", "lastName": "Smiti", "age": 26}';
  put ']';
run;
filename mymap temp;
libname myj JSON map=mymap automap=reuse fileref=myjson access=readonly;
libname myjmap json fileref=mymap access=readonly;
data _null_;
  set myjmap.datasets;
  put (_n_ _all_) (=);
run;

data _null_;
  set myjmap.datasets_variables;
  put (_n_ _all_) (=);
run;

Results:

NOTE: The infile MYMAP is: ...

{
  "DATASETS": [
    {
      "DSNAME": "root",
      "TABLEPATH": "/root",
      "VARIABLES": [
        {
          "NAME": "ordinal_root",
          "TYPE": "ORDINAL",
          "PATH": "/root"
        },
        {
          "NAME": "firstName",
          "TYPE": "CHARACTER",
          "PATH": "/root/firstName",
          "CURRENT_LENGTH": 4
        },
        {
          "NAME": "lastName",
          "TYPE": "CHARACTER",
          "PATH": "/root/lastName",
          "CURRENT_LENGTH": 5
        },
        {
          "NAME": "age",
          "TYPE": "NUMERIC",
          "PATH": "/root/age"
        }
      ]
    }
  ]
}
NOTE: 32 records were read from the infile MYMAP.
      The minimum record length was 1.
      The maximum record length was 36.
...

281   data _null_;
282     set myjmap.datasets_variables;
283     put (_n_ _all_) (=);
284   run;

_N_=1 ordinal_DATASETS=1 ordinal_VARIABLES=1 NAME=ordinal_root TYPE=ORDINAL PATH=/root CURRENT_LENGTH=.
_N_=2 ordinal_DATASETS=1 ordinal_VARIABLES=2 NAME=firstName TYPE=CHARACTER PATH=/root/firstName
CURRENT_LENGTH=4
_N_=3 ordinal_DATASETS=1 ordinal_VARIABLES=3 NAME=lastName TYPE=CHARACTER PATH=/root/lastName
CURRENT_LENGTH=5
_N_=4 ordinal_DATASETS=1 ordinal_VARIABLES=4 NAME=age TYPE=NUMERIC PATH=/root/age CURRENT_LENGTH=.
NOTE: There were 4 observations read from the data set MYJMAP.DATASETS_VARIABLES.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds

Does anyone have a program that can convert a dataset in the DATASETS_VARIABLES format back into the MAP file that the JSON engine needs?  If so then you can use a SAS program to read the map file, analyze it, modify it, write it and then use it to re-read the JSON file.

melligeri
Calcite | Level 5

In the example code how are you able to fetch both the names; JSmith and MAnderson using single path in the json;"PATH": "/root/info/name"
When I try to access multiple values with same path I get the output of only one value. In this case the output will be only MAnderson

 

Please help.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 3370 views
  • 3 likes
  • 5 in conversation