BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
vfarmak
Quartz | Level 8

Greetings everyone, 

I am facing the following issue: I have loaded a dataset in SASDI that some columns contain json arrays.

The problem with json arrays is that do not have always the same length. At first I used the technique provided by the below thread

https://communities.sas.com/t5/SAS-Programming/SAS-DI-4-9-JSON-Column-Parse-Data-Step/m-p/684905#M20...

by ChrisZ and it works fine with one size arrays.

 

When it comes to arrays that have size greater than one, the code reads only the first line.

The expected outcome is to generate in the below example lines that are associated with the number of the array elements.

idcontent
000001[{"item":"value1"}, {"item":"value2"},{"item":"value3"}]

to 

 

idvalue
000001value1
000001value2
000001value3

 

 

I searched for the following:

  1. SCANOVER function: this comes in handy but I think that it can be used only in the infile statement
  2. I tried the PROC DS2 with no luck: I was able to execute the code but when the parser is initialized; it does not see any of the tokens provided in the json column
  3. JSON lib does not supported by the version of our SAS system

Below you will find the attempt with the PROC DS2 & the sas version

SAS_Enterprise_Guide_Version.PNG

 

Contents of HAVE Dataset

_idcontactProgress
5fa1b999c2dc2170299dc0b4[{"date": { "$date": "2020-11-04T15:20:00.000Z" }, "details": {"code": "040", "description": "Follow up"}, "feedback": {"code": "020", "description": "No answer"}, "notes": "this is the content of the text provided, "type": "INPROGRESS"}, {"date": { "$date": "2020-11-04T18:00:00.000Z" }, "details": {"code": "040", "description": "Follow up"}, "feedback": {"code": "010", "description": "Communication"}, "notes": "This is the new note test", "type": "INPROGRESS"}, {"date": { "$date": "2020-11-05T16:30:00.000Z" }, "details": {"code": "040", "description": "Follow up"}, "feedback": {"code": "010", "description": "Communication"}, "notes": "The test quotation has been sent to the client", "type": "INPROGRESS"}]

 

DATA HAVE;
SET WORK.QUERY_FOR_WRK_LEADS_KPIS;
RUN;

PROC DS2;
DATA WANT (OVERWRITE=YES);
DCL PACKAGE JSON J();
DCL INTEGER TOKENTYPE PARSEFLAGS;
DCL NVARCHAR(128) TOKEN;
DCL INTEGER RC;
DCL CHAR(2000) CHARACTER SET UTF8 CONTACT_PROGRESS;

METHOD INIT();
RC = J.CREATEPARSER( CONTACT_PROGRESS, 32767 );

DO WHILE( RC = 0 );
J.GETNEXTTOKEN( RC, TOKEN, TOKENTYPE, PARSEFLAGS );
IF (TOKEN = '') THEN do; 
%put 'Empty Token Found';

end;
END;
END;

METHOD RUN();
SET WORK.HAVE;
END;


method term();
rc = j.destroyParser();
end;

ENDDATA;
RUN;

 

 

Thank you in advance for your time.

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Let's generalize a little more.  First let's take the two example strings posted in the original question and make a dataset out of them.  (Note the second string was not valid JSON as one of the quoted strings was missing the close quote so I fixed that).  So we have two variables  ID and a JSON string.

data have;
  infile cards truncover ;
  input id :$10. json $1000. ;
cards4;
001 [{"item":"value1"}, {"item":"value2"},{"item":"value3"}]
002 [{"date": { "$date": "2020-11-04T15:20:00.000Z" }, "details": {"code": "040", "description": "Follow up"}, "feedback": {"code": "020", "description": "No answer"}, "notes": "this is the content of the text provided", "type": "INPROGRESS"}, {"date": { "$date": "2020-11-04T18:00:00.000Z" }, "details": {"code": "040", "description": "Follow up"}, "feedback": {"code": "010", "description": "Communication"}, "notes": "This is the new note test", "type": "INPROGRESS"}, {"date": { "$date": "2020-11-05T16:30:00.000Z" }, "details": {"code": "040", "description": "Follow up"}, "feedback": {"code": "010", "description": "Communication"}, "notes": "The test quotation has been sent to the client", "type": "INPROGRESS"}]
;;;;

Now let's use code like Chris' to generate a JSON file.  Note the use of the TEMP engine to make get a temporary file.  Also I much prefer to place the continuation characters (commas in JSON syntax) at the START of the line instead of the END of the line so that humans can see them easier.

filename json temp;
data _null_;
  file json;
  set have end=eof;
  if _n_=1 then put '{' @;
  else put ',' @;
  put id :$quote. ':' json ;
  if eof then put '}';
run;

Now we can make a libref using the JSON engine (it will default to look for a FILEREF with the same name as the LIBREF).

libname json json ;

Here is what the ALLDATA structure looks like for this JSON data.

     Obs    P    P1     P2          P3             V    Value

       1    1    001                               0
       2    2    001    item                       1    value1
       3    1    001                               0
       4    2    001    item                       1    value2
       5    1    001                               0
       6    2    001    item                       1    value3
       7    1    002                               0
       8    2    002    date                       0
       9    3    002    date        $date          1    2020-11-04T15:20:00.000Z
      10    2    002    details                    0
      11    3    002    details     code           1    040
      12    3    002    details     description    1    Follow up
      13    2    002    feedback                   0
      14    3    002    feedback    code           1    020
      15    3    002    feedback    description    1    No answer
      16    2    002    notes                      1    this is the content of the text provided
      17    2    002    type                       1    INPROGRESS
      18    1    002                               0
      19    2    002    date                       0
      20    3    002    date        $date          1    2020-11-04T18:00:00.000Z
...

P is what level in the JSON hierarchy this value lives at (1,2,3,...)  and P1,P2,... are the names of the levels.  Note that in this case P1 is the value we wrote from ID and P2 onward of the names in the JSON strings.  V tells if VALUE is populated. 

So we can use this to make a structure that PROC TRANSPOSE can use. 

First let's rename P and P1 so that we can then use P: wildcard to find P2,P3,P4,..... without knowing how deep the hierarchy is for this JSON text.  As long as those names are short enough we can use them to make variable names.  If they are too long we might need to add some more logic to generate unique names.

To handle the repeating names we can add a sequence number. We can increment the counter whenever P=1. We can use changes in P1 to signal restarting the counter.

data tall ;
  set json.alldata(rename=(p=first_id p1=id)) ;
  by id notsorted;
  length _name_ $32 _label_ $200 ;
  _label_=catx(' ',of p:);
  _name_ =catx('_',of p:);
  if first.id then seq=0;
  if first_id=1 then seq+1;
  if v=1;
run;

Now we can use PROC TRANSPOSE.

proc transpose data= tall out=want(drop=_name_);
  by id seq notsorted;
  var value;
run;

And get data like this:

                                                           details_     details_      feedback_
Obs    id     seq     item            date__date             code      description      code

 1     001     1     value1
 2     001     2     value2
 3     001     3     value3
 4     002     1               2020-11-04T15:20:00.000Z      040        Follow up        020
 5     002     2               2020-11-04T18:00:00.000Z      040        Follow up        010
 6     002     3               2020-11-05T16:30:00.000Z      040        Follow up        010

         feedback_
Obs     description     notes                                                type

 1
 2
 3
 4     No answer        this is the content of the text provided          INPROGRESS
 5     Communication    This is the new note test                         INPROGRESS
 6     Communication    The test quotation has been sent to the client    INPROGRESS

[See also the code in this reply that shows how to use DATA step and the JSON libname engine to read a series of JSON data entries.]

View solution in original post

17 REPLIES 17
Tom
Super User Tom
Super User

Why do you have that macro code in the middle of your PROC DS2 step?  The %PUT is always going to print its message while SAS is scanning the code and preparing the PROC step to execute.  The %PUT will not do anything while the step is running.  Did you mean to use a normal PUT statement there?

vfarmak
Quartz | Level 8

I used it for debugging purposes. 

I changed it as you suggested, but nothing is printed.

Ksharp
Super User

@ChrisHemedinger  wrote a blog about it reading JSONL before.

Or could write some raw sas data step code to parse it.

 

vfarmak
Quartz | Level 8

Yes indeed I am aware of it.

However it focuses on Json data retrieved from a web service, rather than processing json arrays stored in a sas column.

I used an excerpt from the PROC DS2 that @ChrisHemedinger with no luck.

 

The problem resides in how to target the column's value. He uses RestAPI to capture the data and not a sas dataset with json values stored.

ChrisHemedinger
Community Manager

I think your best option is to use the JSON libname engine. However, this operates only on text files, so you first need to extract the JSON data from the data set and create a text file for each record, or one large JSON text file with a wrapper object.  This assumes the data field you're reading is a valid complete JSON string.

 

Let's illustrate using an example. I don't have your data, so I'll simulate by creating a similar data set from SASHELP.CLASS.

 

%let datasource = sashelp.class;

/* Create a new subfolder in WORK to hold */
/* temp JSON files, avoiding conflicts    */
options dlcreatedir;
%let workpath = %sysfunc(getoption(WORK))/json;
libname json "&workpath.";
libname json clear;

/* Will create a run a separate PROC JSON step */
/* for each record.  This might take a while   */
/* for very large data.                        */
/* Each iteration will create a new JSON file  */
data _null_;
 set &datasource.;
 call execute(catt('filename out "',"&workpath./out",_n_,'.json";'));
 call execute('proc json out=out nosastags ;');
 call execute("export &datasource.(obs="||_n_||" firstobs="||_n_||");");
 call execute('run;');
run;

data records;
 length ID $ 10 Details $ 32767;
 infile "&workpath./out*.json";
 input;
 ID = cats('ID',put(_n_,z3.));
 details = _infile_;
run;

This creates a data set that looks similar to yours:

2020-11-24_12-40-14.jpg

Now, starting with this, let's create a JSON file that has these records formatted as name/value pairs:

/* Now create one large JSON file that wraps these name/value pairs */
filename stage TEMP;
data _null_;
 length v $ 15;
 file stage;
 set records end=done;
 if _n_ = 1 then do;
  put '{';
 end;
 v = cats('"',ID,'":');
 put v;
 put Details;
 if done then do;
  put '}';
 end;
 else put ',';
run;

The resulting JSON looks like this:

{
  "ID001": [
    {
      "Name": "Alfred",
      "Sex": "M",
      "Age": 14,
      "Height": 69,
      "Weight": 112.5
    }
  ],
  "ID002": [
    {
      "Name": "John",
      "Sex": "M",
      "Age": 12,
      "Height": 59,
      "Weight": 99.5
    }
  ],
  "ID003": [
    {
      "Name": "Joyce",
      "Sex": "F",
      "Age": 11,
      "Height": 51.3,
      "Weight": 50.5
    }
  ],

Now we can use the JSON libname engine to read this like a set of data sets that we can assemble:

libname final json fileref=stage;
data final;
 length ID $ 15;
 set final.ID: indsname=src;
 drop ordinal:;
 ID=scan(src,2,'.');
run;

Result:

2020-11-24_12-44-53.jpg

Check out SAS Innovate on-demand content! Watch the main stage sessions, keynotes, and over 20 technical breakout sessions!
Tom
Super User Tom
Super User

Let's generalize a little more.  First let's take the two example strings posted in the original question and make a dataset out of them.  (Note the second string was not valid JSON as one of the quoted strings was missing the close quote so I fixed that).  So we have two variables  ID and a JSON string.

data have;
  infile cards truncover ;
  input id :$10. json $1000. ;
cards4;
001 [{"item":"value1"}, {"item":"value2"},{"item":"value3"}]
002 [{"date": { "$date": "2020-11-04T15:20:00.000Z" }, "details": {"code": "040", "description": "Follow up"}, "feedback": {"code": "020", "description": "No answer"}, "notes": "this is the content of the text provided", "type": "INPROGRESS"}, {"date": { "$date": "2020-11-04T18:00:00.000Z" }, "details": {"code": "040", "description": "Follow up"}, "feedback": {"code": "010", "description": "Communication"}, "notes": "This is the new note test", "type": "INPROGRESS"}, {"date": { "$date": "2020-11-05T16:30:00.000Z" }, "details": {"code": "040", "description": "Follow up"}, "feedback": {"code": "010", "description": "Communication"}, "notes": "The test quotation has been sent to the client", "type": "INPROGRESS"}]
;;;;

Now let's use code like Chris' to generate a JSON file.  Note the use of the TEMP engine to make get a temporary file.  Also I much prefer to place the continuation characters (commas in JSON syntax) at the START of the line instead of the END of the line so that humans can see them easier.

filename json temp;
data _null_;
  file json;
  set have end=eof;
  if _n_=1 then put '{' @;
  else put ',' @;
  put id :$quote. ':' json ;
  if eof then put '}';
run;

Now we can make a libref using the JSON engine (it will default to look for a FILEREF with the same name as the LIBREF).

libname json json ;

Here is what the ALLDATA structure looks like for this JSON data.

     Obs    P    P1     P2          P3             V    Value

       1    1    001                               0
       2    2    001    item                       1    value1
       3    1    001                               0
       4    2    001    item                       1    value2
       5    1    001                               0
       6    2    001    item                       1    value3
       7    1    002                               0
       8    2    002    date                       0
       9    3    002    date        $date          1    2020-11-04T15:20:00.000Z
      10    2    002    details                    0
      11    3    002    details     code           1    040
      12    3    002    details     description    1    Follow up
      13    2    002    feedback                   0
      14    3    002    feedback    code           1    020
      15    3    002    feedback    description    1    No answer
      16    2    002    notes                      1    this is the content of the text provided
      17    2    002    type                       1    INPROGRESS
      18    1    002                               0
      19    2    002    date                       0
      20    3    002    date        $date          1    2020-11-04T18:00:00.000Z
...

P is what level in the JSON hierarchy this value lives at (1,2,3,...)  and P1,P2,... are the names of the levels.  Note that in this case P1 is the value we wrote from ID and P2 onward of the names in the JSON strings.  V tells if VALUE is populated. 

So we can use this to make a structure that PROC TRANSPOSE can use. 

First let's rename P and P1 so that we can then use P: wildcard to find P2,P3,P4,..... without knowing how deep the hierarchy is for this JSON text.  As long as those names are short enough we can use them to make variable names.  If they are too long we might need to add some more logic to generate unique names.

To handle the repeating names we can add a sequence number. We can increment the counter whenever P=1. We can use changes in P1 to signal restarting the counter.

data tall ;
  set json.alldata(rename=(p=first_id p1=id)) ;
  by id notsorted;
  length _name_ $32 _label_ $200 ;
  _label_=catx(' ',of p:);
  _name_ =catx('_',of p:);
  if first.id then seq=0;
  if first_id=1 then seq+1;
  if v=1;
run;

Now we can use PROC TRANSPOSE.

proc transpose data= tall out=want(drop=_name_);
  by id seq notsorted;
  var value;
run;

And get data like this:

                                                           details_     details_      feedback_
Obs    id     seq     item            date__date             code      description      code

 1     001     1     value1
 2     001     2     value2
 3     001     3     value3
 4     002     1               2020-11-04T15:20:00.000Z      040        Follow up        020
 5     002     2               2020-11-04T18:00:00.000Z      040        Follow up        010
 6     002     3               2020-11-05T16:30:00.000Z      040        Follow up        010

         feedback_
Obs     description     notes                                                type

 1
 2
 3
 4     No answer        this is the content of the text provided          INPROGRESS
 5     Communication    This is the new note test                         INPROGRESS
 6     Communication    The test quotation has been sent to the client    INPROGRESS

[See also the code in this reply that shows how to use DATA step and the JSON libname engine to read a series of JSON data entries.]

vfarmak
Quartz | Level 8

Thank you @ChrisHemedinger & @Tom 

 

I tried to run the code but the following error appears:

43         libname json json ;
NOTE: JSON data is only read once.  To read the JSON again, reassign the JSON LIBNAME.
ERROR: Invalid JSON in input near line 1 column 5: Encountered an illegal character.
ERROR: Error in the LIBNAME statement.

Does this have to do with the version of Enterprise Guide that I am running? Does it support the libname json?

 

I have read @ChrisHemedinger blog regarding the json library, I tried to use it with no luck. 

This is the main reason that I tried to used the PROC DS2 method.

Tom
Super User Tom
Super User

So that error is saying the JSON is not well formed. The error is early in the first line, so just look at the file. You can run this data step to dump the first line into the SAS log.

data _null_;
  infile json obs=1;
  input;
  list;
run;

If the JSON is not well formed then reading it with any JSON parser is not going to work.

vfarmak
Quartz | Level 8

Thank you @Tom for you input.

I did used the debug and I have observed the following:

  1. Is it necessary to add the brackets { .. [] ... } in the json file
  2. the json variable that is assigned in the filename statement is empty (I suspect that this is the json file written to the temp folder which SAS Enterprise Guide uses to read and write data. Is it true? If so, why I cannot see the value in the debugger?
vfarmak
Quartz | Level 8

Also @Tom to give you a better perspective of the debugger I have attached the following screeshot:

 

json_sas_datastep.PNG

Tom
Super User Tom
Super User

Never tried to use the data step debugger.

But the statement you have split over lines 10 and 11 needs to be AFTER the statement you have on line 12.

 

If the right side of that photograph is supposed to show your data then it looks like the variable with the JSON text is named CONTACTPROGRESS and not JSON.

vfarmak
Quartz | Level 8

Hello @ChrisHemedinger ,

 

First of all thank you for your time on this.

I have reviewed thoroughly your code and I understood exactly what your were trying to achieve.

I did the same this for my part, I located the file in the disk that your code creates.

In the final datastep, I can see that you are using only the ID column and not all the others (in your example sex, weight etc.).

Is this something you did on purpose and the code automatically read the columns or this is something that i should do explicitly in order to view the rest of the data?

ChrisHemedinger
Community Manager

@vfarmak In that final step where I have a length statement only for ID, the remaining columns are being read from the JSON library via the set statement.  This code does not presume knowledge of the column names/attributes that come from that JSON -- but as you can see in the final result, the data is there.

 

Nested JSON structures might require more code and join/merge operations for related tables that the JSON engine will create based on structure.  The code from @Tom will read all of the values regardless of nesting structure, and is a good approach when you have no idea of the data structure/contents.  But my approach does take advantage of the JSON engine ability to automatically assign the proper variable types and lengths based on the data values.

Check out SAS Innovate on-demand content! Watch the main stage sessions, keynotes, and over 20 technical breakout sessions!
vfarmak
Quartz | Level 8

Greetings @ChrisHemedinger and @Tom ,

I researched your answers in detail and I flattened the objects not only in EG but also in DI.

Thank you very much for your effort in this.

 

One last question: Can I find material related to the libraries and the operations that they offer? For example in the Json Library I found the property all_data which stores all the data in a sas dataset. 

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 17 replies
  • 3303 views
  • 6 likes
  • 4 in conversation