Greetings everyone,
I am facing the following issue: I have loaded a dataset in SASDI that some columns contain json arrays.
The problem with json arrays is that do not have always the same length. At first I used the technique provided by the below thread
by ChrisZ and it works fine with one size arrays.
When it comes to arrays that have size greater than one, the code reads only the first line.
The expected outcome is to generate in the below example lines that are associated with the number of the array elements.
id | content |
000001 | [{"item":"value1"}, {"item":"value2"},{"item":"value3"}] |
to
id | value |
000001 | value1 |
000001 | value2 |
000001 | value3 |
I searched for the following:
Below you will find the attempt with the PROC DS2 & the sas version
Contents of HAVE Dataset
_id | contactProgress |
5fa1b999c2dc2170299dc0b4 | [{"date": { "$date": "2020-11-04T15:20:00.000Z" }, "details": {"code": "040", "description": "Follow up"}, "feedback": {"code": "020", "description": "No answer"}, "notes": "this is the content of the text provided, "type": "INPROGRESS"}, {"date": { "$date": "2020-11-04T18:00:00.000Z" }, "details": {"code": "040", "description": "Follow up"}, "feedback": {"code": "010", "description": "Communication"}, "notes": "This is the new note test", "type": "INPROGRESS"}, {"date": { "$date": "2020-11-05T16:30:00.000Z" }, "details": {"code": "040", "description": "Follow up"}, "feedback": {"code": "010", "description": "Communication"}, "notes": "The test quotation has been sent to the client", "type": "INPROGRESS"}] |
DATA HAVE;
SET WORK.QUERY_FOR_WRK_LEADS_KPIS;
RUN;
PROC DS2;
DATA WANT (OVERWRITE=YES);
DCL PACKAGE JSON J();
DCL INTEGER TOKENTYPE PARSEFLAGS;
DCL NVARCHAR(128) TOKEN;
DCL INTEGER RC;
DCL CHAR(2000) CHARACTER SET UTF8 CONTACT_PROGRESS;
METHOD INIT();
RC = J.CREATEPARSER( CONTACT_PROGRESS, 32767 );
DO WHILE( RC = 0 );
J.GETNEXTTOKEN( RC, TOKEN, TOKENTYPE, PARSEFLAGS );
IF (TOKEN = '') THEN do;
%put 'Empty Token Found';
end;
END;
END;
METHOD RUN();
SET WORK.HAVE;
END;
method term();
rc = j.destroyParser();
end;
ENDDATA;
RUN;
Thank you in advance for your time.
Let's generalize a little more. First let's take the two example strings posted in the original question and make a dataset out of them. (Note the second string was not valid JSON as one of the quoted strings was missing the close quote so I fixed that). So we have two variables ID and a JSON string.
data have;
infile cards truncover ;
input id :$10. json $1000. ;
cards4;
001 [{"item":"value1"}, {"item":"value2"},{"item":"value3"}]
002 [{"date": { "$date": "2020-11-04T15:20:00.000Z" }, "details": {"code": "040", "description": "Follow up"}, "feedback": {"code": "020", "description": "No answer"}, "notes": "this is the content of the text provided", "type": "INPROGRESS"}, {"date": { "$date": "2020-11-04T18:00:00.000Z" }, "details": {"code": "040", "description": "Follow up"}, "feedback": {"code": "010", "description": "Communication"}, "notes": "This is the new note test", "type": "INPROGRESS"}, {"date": { "$date": "2020-11-05T16:30:00.000Z" }, "details": {"code": "040", "description": "Follow up"}, "feedback": {"code": "010", "description": "Communication"}, "notes": "The test quotation has been sent to the client", "type": "INPROGRESS"}]
;;;;
Now let's use code like Chris' to generate a JSON file. Note the use of the TEMP engine to make get a temporary file. Also I much prefer to place the continuation characters (commas in JSON syntax) at the START of the line instead of the END of the line so that humans can see them easier.
filename json temp;
data _null_;
file json;
set have end=eof;
if _n_=1 then put '{' @;
else put ',' @;
put id :$quote. ':' json ;
if eof then put '}';
run;
Now we can make a libref using the JSON engine (it will default to look for a FILEREF with the same name as the LIBREF).
libname json json ;
Here is what the ALLDATA structure looks like for this JSON data.
Obs P P1 P2 P3 V Value 1 1 001 0 2 2 001 item 1 value1 3 1 001 0 4 2 001 item 1 value2 5 1 001 0 6 2 001 item 1 value3 7 1 002 0 8 2 002 date 0 9 3 002 date $date 1 2020-11-04T15:20:00.000Z 10 2 002 details 0 11 3 002 details code 1 040 12 3 002 details description 1 Follow up 13 2 002 feedback 0 14 3 002 feedback code 1 020 15 3 002 feedback description 1 No answer 16 2 002 notes 1 this is the content of the text provided 17 2 002 type 1 INPROGRESS 18 1 002 0 19 2 002 date 0 20 3 002 date $date 1 2020-11-04T18:00:00.000Z ...
P is what level in the JSON hierarchy this value lives at (1,2,3,...) and P1,P2,... are the names of the levels. Note that in this case P1 is the value we wrote from ID and P2 onward of the names in the JSON strings. V tells if VALUE is populated.
So we can use this to make a structure that PROC TRANSPOSE can use.
First let's rename P and P1 so that we can then use P: wildcard to find P2,P3,P4,..... without knowing how deep the hierarchy is for this JSON text. As long as those names are short enough we can use them to make variable names. If they are too long we might need to add some more logic to generate unique names.
To handle the repeating names we can add a sequence number. We can increment the counter whenever P=1. We can use changes in P1 to signal restarting the counter.
data tall ;
set json.alldata(rename=(p=first_id p1=id)) ;
by id notsorted;
length _name_ $32 _label_ $200 ;
_label_=catx(' ',of p:);
_name_ =catx('_',of p:);
if first.id then seq=0;
if first_id=1 then seq+1;
if v=1;
run;
Now we can use PROC TRANSPOSE.
proc transpose data= tall out=want(drop=_name_);
by id seq notsorted;
var value;
run;
And get data like this:
details_ details_ feedback_ Obs id seq item date__date code description code 1 001 1 value1 2 001 2 value2 3 001 3 value3 4 002 1 2020-11-04T15:20:00.000Z 040 Follow up 020 5 002 2 2020-11-04T18:00:00.000Z 040 Follow up 010 6 002 3 2020-11-05T16:30:00.000Z 040 Follow up 010 feedback_ Obs description notes type 1 2 3 4 No answer this is the content of the text provided INPROGRESS 5 Communication This is the new note test INPROGRESS 6 Communication The test quotation has been sent to the client INPROGRESS
[See also the code in this reply that shows how to use DATA step and the JSON libname engine to read a series of JSON data entries.]
Why do you have that macro code in the middle of your PROC DS2 step? The %PUT is always going to print its message while SAS is scanning the code and preparing the PROC step to execute. The %PUT will not do anything while the step is running. Did you mean to use a normal PUT statement there?
I used it for debugging purposes.
I changed it as you suggested, but nothing is printed.
@ChrisHemedinger wrote a blog about it reading JSONL before.
Or could write some raw sas data step code to parse it.
Yes indeed I am aware of it.
However it focuses on Json data retrieved from a web service, rather than processing json arrays stored in a sas column.
I used an excerpt from the PROC DS2 that @ChrisHemedinger with no luck.
The problem resides in how to target the column's value. He uses RestAPI to capture the data and not a sas dataset with json values stored.
I think your best option is to use the JSON libname engine. However, this operates only on text files, so you first need to extract the JSON data from the data set and create a text file for each record, or one large JSON text file with a wrapper object. This assumes the data field you're reading is a valid complete JSON string.
Let's illustrate using an example. I don't have your data, so I'll simulate by creating a similar data set from SASHELP.CLASS.
%let datasource = sashelp.class;
/* Create a new subfolder in WORK to hold */
/* temp JSON files, avoiding conflicts */
options dlcreatedir;
%let workpath = %sysfunc(getoption(WORK))/json;
libname json "&workpath.";
libname json clear;
/* Will create a run a separate PROC JSON step */
/* for each record. This might take a while */
/* for very large data. */
/* Each iteration will create a new JSON file */
data _null_;
set &datasource.;
call execute(catt('filename out "',"&workpath./out",_n_,'.json";'));
call execute('proc json out=out nosastags ;');
call execute("export &datasource.(obs="||_n_||" firstobs="||_n_||");");
call execute('run;');
run;
data records;
length ID $ 10 Details $ 32767;
infile "&workpath./out*.json";
input;
ID = cats('ID',put(_n_,z3.));
details = _infile_;
run;
This creates a data set that looks similar to yours:
Now, starting with this, let's create a JSON file that has these records formatted as name/value pairs:
/* Now create one large JSON file that wraps these name/value pairs */
filename stage TEMP;
data _null_;
length v $ 15;
file stage;
set records end=done;
if _n_ = 1 then do;
put '{';
end;
v = cats('"',ID,'":');
put v;
put Details;
if done then do;
put '}';
end;
else put ',';
run;
The resulting JSON looks like this:
{ "ID001": [ { "Name": "Alfred", "Sex": "M", "Age": 14, "Height": 69, "Weight": 112.5 } ], "ID002": [ { "Name": "John", "Sex": "M", "Age": 12, "Height": 59, "Weight": 99.5 } ], "ID003": [ { "Name": "Joyce", "Sex": "F", "Age": 11, "Height": 51.3, "Weight": 50.5 } ],
Now we can use the JSON libname engine to read this like a set of data sets that we can assemble:
libname final json fileref=stage;
data final;
length ID $ 15;
set final.ID: indsname=src;
drop ordinal:;
ID=scan(src,2,'.');
run;
Result:
Let's generalize a little more. First let's take the two example strings posted in the original question and make a dataset out of them. (Note the second string was not valid JSON as one of the quoted strings was missing the close quote so I fixed that). So we have two variables ID and a JSON string.
data have;
infile cards truncover ;
input id :$10. json $1000. ;
cards4;
001 [{"item":"value1"}, {"item":"value2"},{"item":"value3"}]
002 [{"date": { "$date": "2020-11-04T15:20:00.000Z" }, "details": {"code": "040", "description": "Follow up"}, "feedback": {"code": "020", "description": "No answer"}, "notes": "this is the content of the text provided", "type": "INPROGRESS"}, {"date": { "$date": "2020-11-04T18:00:00.000Z" }, "details": {"code": "040", "description": "Follow up"}, "feedback": {"code": "010", "description": "Communication"}, "notes": "This is the new note test", "type": "INPROGRESS"}, {"date": { "$date": "2020-11-05T16:30:00.000Z" }, "details": {"code": "040", "description": "Follow up"}, "feedback": {"code": "010", "description": "Communication"}, "notes": "The test quotation has been sent to the client", "type": "INPROGRESS"}]
;;;;
Now let's use code like Chris' to generate a JSON file. Note the use of the TEMP engine to make get a temporary file. Also I much prefer to place the continuation characters (commas in JSON syntax) at the START of the line instead of the END of the line so that humans can see them easier.
filename json temp;
data _null_;
file json;
set have end=eof;
if _n_=1 then put '{' @;
else put ',' @;
put id :$quote. ':' json ;
if eof then put '}';
run;
Now we can make a libref using the JSON engine (it will default to look for a FILEREF with the same name as the LIBREF).
libname json json ;
Here is what the ALLDATA structure looks like for this JSON data.
Obs P P1 P2 P3 V Value 1 1 001 0 2 2 001 item 1 value1 3 1 001 0 4 2 001 item 1 value2 5 1 001 0 6 2 001 item 1 value3 7 1 002 0 8 2 002 date 0 9 3 002 date $date 1 2020-11-04T15:20:00.000Z 10 2 002 details 0 11 3 002 details code 1 040 12 3 002 details description 1 Follow up 13 2 002 feedback 0 14 3 002 feedback code 1 020 15 3 002 feedback description 1 No answer 16 2 002 notes 1 this is the content of the text provided 17 2 002 type 1 INPROGRESS 18 1 002 0 19 2 002 date 0 20 3 002 date $date 1 2020-11-04T18:00:00.000Z ...
P is what level in the JSON hierarchy this value lives at (1,2,3,...) and P1,P2,... are the names of the levels. Note that in this case P1 is the value we wrote from ID and P2 onward of the names in the JSON strings. V tells if VALUE is populated.
So we can use this to make a structure that PROC TRANSPOSE can use.
First let's rename P and P1 so that we can then use P: wildcard to find P2,P3,P4,..... without knowing how deep the hierarchy is for this JSON text. As long as those names are short enough we can use them to make variable names. If they are too long we might need to add some more logic to generate unique names.
To handle the repeating names we can add a sequence number. We can increment the counter whenever P=1. We can use changes in P1 to signal restarting the counter.
data tall ;
set json.alldata(rename=(p=first_id p1=id)) ;
by id notsorted;
length _name_ $32 _label_ $200 ;
_label_=catx(' ',of p:);
_name_ =catx('_',of p:);
if first.id then seq=0;
if first_id=1 then seq+1;
if v=1;
run;
Now we can use PROC TRANSPOSE.
proc transpose data= tall out=want(drop=_name_);
by id seq notsorted;
var value;
run;
And get data like this:
details_ details_ feedback_ Obs id seq item date__date code description code 1 001 1 value1 2 001 2 value2 3 001 3 value3 4 002 1 2020-11-04T15:20:00.000Z 040 Follow up 020 5 002 2 2020-11-04T18:00:00.000Z 040 Follow up 010 6 002 3 2020-11-05T16:30:00.000Z 040 Follow up 010 feedback_ Obs description notes type 1 2 3 4 No answer this is the content of the text provided INPROGRESS 5 Communication This is the new note test INPROGRESS 6 Communication The test quotation has been sent to the client INPROGRESS
[See also the code in this reply that shows how to use DATA step and the JSON libname engine to read a series of JSON data entries.]
Thank you @ChrisHemedinger & @Tom
I tried to run the code but the following error appears:
43 libname json json ; NOTE: JSON data is only read once. To read the JSON again, reassign the JSON LIBNAME. ERROR: Invalid JSON in input near line 1 column 5: Encountered an illegal character. ERROR: Error in the LIBNAME statement.
Does this have to do with the version of Enterprise Guide that I am running? Does it support the libname json?
I have read @ChrisHemedinger blog regarding the json library, I tried to use it with no luck.
This is the main reason that I tried to used the PROC DS2 method.
So that error is saying the JSON is not well formed. The error is early in the first line, so just look at the file. You can run this data step to dump the first line into the SAS log.
data _null_;
infile json obs=1;
input;
list;
run;
If the JSON is not well formed then reading it with any JSON parser is not going to work.
Thank you @Tom for you input.
I did used the debug and I have observed the following:
Never tried to use the data step debugger.
But the statement you have split over lines 10 and 11 needs to be AFTER the statement you have on line 12.
If the right side of that photograph is supposed to show your data then it looks like the variable with the JSON text is named CONTACTPROGRESS and not JSON.
Hello @ChrisHemedinger ,
First of all thank you for your time on this.
I have reviewed thoroughly your code and I understood exactly what your were trying to achieve.
I did the same this for my part, I located the file in the disk that your code creates.
In the final datastep, I can see that you are using only the ID column and not all the others (in your example sex, weight etc.).
Is this something you did on purpose and the code automatically read the columns or this is something that i should do explicitly in order to view the rest of the data?
@vfarmak In that final step where I have a length statement only for ID, the remaining columns are being read from the JSON library via the set statement. This code does not presume knowledge of the column names/attributes that come from that JSON -- but as you can see in the final result, the data is there.
Nested JSON structures might require more code and join/merge operations for related tables that the JSON engine will create based on structure. The code from @Tom will read all of the values regardless of nesting structure, and is a good approach when you have no idea of the data structure/contents. But my approach does take advantage of the JSON engine ability to automatically assign the proper variable types and lengths based on the data values.
Greetings @ChrisHemedinger and @Tom ,
I researched your answers in detail and I flattened the objects not only in EG but also in DI.
Thank you very much for your effort in this.
One last question: Can I find material related to the libraries and the operations that they offer? For example in the Json Library I found the property all_data which stores all the data in a sas dataset.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.