Hi All,
I've been playing with the JSON engine and the ALLDATA dataset it creates. It seems... almost useful?
It looks like it's just a collection of name-value pairs, without any indication of a key such as a row number / object number / ordinal root / something that would allow you to identify the group of name-value pairs that came from a single object.
For example, given this JSON data with three objects:
filename myjson temp;
data _null_;
file myjson ;
put '[' ;
put '{"x":1}';
put ',' ;
put '{"x":10, "y":100}';
put ',' ;
put '{"z":1000}';
put ']';
run;
libname myj JSON fileref=myjson;
data want ;
set myj.alldata ;
put (_all_)(=) ;
run ;
Alldata has:
345 data alldata ; 346 set myj.alldata ; 347 put (_all_)(=) ; 348 run ; P=1 P1=x V=1 Value=1 P=1 P1=x V=1 Value=10 P=1 P1=y V=1 Value=100 P=1 P1=z V=1 Value=1000
So looking at work.AllData, there is no object ID which would allow you to see with values came from which object. You might assume that the data has two objects, when in fact it has three.
Is there some way to add an Object ID to alldata?
In a simple case if I could assume that the first name-value pair in an object is always a unique ID, I wouldn't need the JSON engine to make an ObjectID for me. But since a JSON object has unordered name-value pairs, even if I know my data has an ID, it doesn't seem safe to assume that the first name-value pair will always be that ID. Without an ObjectID I'm struggling to see how this ALLDATA table could be useful. Am I missing some option?
Thanks,
-Q.
It's up to whoever designs the JSON to add the extra object name/layer if it's important for the data. I've used the JSON engine with dozens of different services/sources and if there has ever been ambiguity, it's due to the construction of the original data and not the engine.
A challenging source can come from GraphQL sources, which are usually designed to fill a UI component on a web page (think React) and not fill a 2-D data model, so you really need to know more about what to expect from the data in that case. I answered a similar question on StackOverflow.
Thanks Chris,
I'm willing to expect the provider to have a recordID field for each object. But I don't think I should require them to enforce that the recordID is always the first name-value pair of an object.
If they give me:
filename myjson temp;
data _null_;
file myjson ;
put '[' ;
put '{"x":1,"ID":"001"}';
put ',' ;
put '{"ID":"002","x":10,"y":100}';
put ',' ;
put '{"z":1000,"ID":"003"}';
put ']';
run;
libname myjson JSON fileref=myjson;
data want ;
set myjson.alldata ;
put (_all_)(=) ;
run ;
That seems fair (since JSON objects are unordered), but the ALLDATA table loses the information about which attributes are associated with each ID. It returens:
16 data want ; 17 set myjson.alldata ; 18 put (_all_)(=) ; 19 run ; P=1 P1=x V=1 Value=1 P=1 P1=ID V=1 Value=001 P=1 P1=ID V=1 Value=002 P=1 P1=x V=1 Value=10 P=1 P1=y V=1 Value=100 P=1 P1=z V=1 Value=1000 P=1 P1=ID V=1 Value=003
I guess for ALLDATA to be useful, I have to assume that attribute order is meaningful, which seems odd, since a json object is an unordered set of name-value pairs. So a JSON provider should be free to change the order of attributes anytime they want.
--Q.
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.