Hello,
I'm fairly new to SAS and it is the only allowed tool in my company to do what I need to do. I need to convert several JSON files into SAS datasets, each one using different criteria. I've managed to use the JSON Engine to open the files and created a map out of one I'm using to do some learning/testing on it. What really confuses me is how SAS treats those maps or files structure to return a DS.
In this example, I have a JSON structure with different sections in its root and each section has different elements. All the sections have the same elements. When I use the JSON engine to convert that to datasets, it creates the ALLDATA dataset and one dataset for every section in root. Inside each of those datasets, SAS gives me with just one observations containing all 50 variables or so.
My end goal is to get just one SAS dataset with one observation per section (root element). Again, every section has the same variables with different contents.
Here´s a short sample of what the AUTOMAP gives me:
{ "DATASETS": [ { "DSNAME": "First_Category", "TABLEPATH": "/root/First Category", "VARIABLES": [ { "NAME": "ordinal_root", "TYPE": "ORDINAL", "PATH": "/root" }, { "NAME": "ordinal_First_Category", "TYPE": "ORDINAL", "PATH": "/root/First Category" }, { "NAME": elementA", "TYPE": "CHARACTER", "PATH": "/root/First Category/elementA", "CURRENT_LENGTH": 4 }, { "NAME": elementB", "TYPE": "CHARACTER", "PATH": "/root/First Category/elementB", "CURRENT_LENGTH": 4 }, { "NAME": elementC", "TYPE": "CHARACTER", "PATH": "/root/First Category/elementC", "CURRENT_LENGTH": 4 } ] }, { "DSNAME": "Second_Category", "TABLEPATH": "/root/Second Category", "VARIABLES": [ { "NAME": "ordinal_root", "TYPE": "ORDINAL", "PATH": "/root" }, { "NAME": "ordinal_Second_Category", "TYPE": "ORDINAL", "PATH": "/root/Second Category" }, { "NAME": elementA", "TYPE": "CHARACTER", "PATH": "/root/Second Category/elementA", "CURRENT_LENGTH": 4 }, { "NAME": elementB", "TYPE": "CHARACTER", "PATH": "/root/Second Category/elementB", "CURRENT_LENGTH": 4 }, { "NAME": elementC", "TYPE": "CHARACTER", "PATH": "/root/Second Category/elementC", "CURRENT_LENGTH": 4 } ] } ] }
How could I create a map by hand to end up with a dataset like the following:
category | elementA | elementb | elementc |
First Category | a | b | c |
Second Category | d | e | f |
I know this might sound very simple but I am stuck at this. I'm having a lot of difficulties trying to understand how SAS converts something like this.
Any help will be much appreciated.
Regards,
Cesar
If the structure is that simple then the ALLDATA table should have exactly what you need.
proc transpose data=json.alldata out=want(drop=_name_ rename=(p1=category));
where v>0;
by p1 notsorted;
id p2 ;
var value;
run;
Or you should be able to just SET the resulting tables together. Make sure to actually DEFINE the variable before the SET to avoid truncation caused by the JSON engine having to GUESS how to define the variables from the limited examples per table.
data want;
length
category $32
location $100
speed $20
idCategory $10
speedMBps 8
timeFixedPrice $10
includedDiscount 8
;
set json.category: indsname=indsname;
category=scan(indsname,-1,'.');
drop ordinal_: ;
run;
Can you post the sample JSON data for that example? Along with the code you used to read it into SAS? I think sharing that would help more than sharing the map that you generated.
Thanks Quentin,
Unfortunately I cannot disclose real information but I've created a very short mock JSON. You can see how each root item has the exact same items below them. Of course those sub-items will vary from one JSON file to another but there's only a 100 JSON files or so and I can easily spend some time making the map for each one. In this JSON sample you can see how 3 tables will be generated with just one row each. My goal is to have onle one resulting dataset created from each JSON and one row per root item (or original dataset however you want to call them):
{
"Category 1": {
"location": "miami",
"speed": "500 MB",
"idCategory": "1312",
"speedMBps": 62,
"timeFixedPrice": "12 M",
"includedDiscount": 30
},
"Category 2": {
"location": "new york",
"speed": "500 MB",
"idCategory": "1314",
"speedMBps": 62,
"timeFixedPrice": "6 M",
"includedDiscount": 35
},
"Category 3": {
"location": "houston",
"speed": "1000 MB",
"idCategory": "1412",
"speedMBps": 124,
"timeFixedPrice": "18 M",
"includedDiscount": 25
}
}
Of course in the original JSON there are way more items per node and tens of nodes but all of them follow the exact same structure. Here's a quick glimpse of what I want to achieve based on this sample data:
Thank you all so much again for your time 😉
I'm not a JSON reading specialist, so I won't be able to help with mapping approach.
From the sample data, you get three 1-record datasets right, and each dataset has the same list of variables? I think I would probably just use the JSON engine to make those three datasets, and then use PROC APPEND or a SET statement to concatenate them. Since you've got a bunch of JSON files, you could probably wrap it in a macro to do it dynamically (e.g. use JSON engine to read a JSON file and create the library with N tables, then use dictionary tables or PROC CONTENTS to get the list of datasets that were created and put that list into a macro variable, then use that list on a SET statement to concatenate).
My other hack idea would be to pre-process the JSON file to make all the node names the same. So you could use a data _null_ step to read in the json file and write out a new json file where you have changed "Category 1" "Category 2" and "Category 3" to just "Category". Then if you used the JSON engine to read that file, it should spit out one table with 3 records, instead of three tables. But that would be an ugly hack.
If the structure is that simple then the ALLDATA table should have exactly what you need.
proc transpose data=json.alldata out=want(drop=_name_ rename=(p1=category));
where v>0;
by p1 notsorted;
id p2 ;
var value;
run;
Or you should be able to just SET the resulting tables together. Make sure to actually DEFINE the variable before the SET to avoid truncation caused by the JSON engine having to GUESS how to define the variables from the limited examples per table.
data want;
length
category $32
location $100
speed $20
idCategory $10
speedMBps 8
timeFixedPrice $10
includedDiscount 8
;
set json.category: indsname=indsname;
category=scan(indsname,-1,'.');
drop ordinal_: ;
run;
Tom, you rock! I had a very busy week of traveling and I'm just seeing this.It makes all the sense in the world and works perfectly! Thank you so much!!!
I wrote an article with some examples:
The trick is to use the ordinal_* variables as the key to join records from the different sections.
Some JSON data might use JSON arrays for keys that have a variable amount of values, and for that you may need to use DATA step to transform them. There's an example of that in the article too.
Hi Chris,
I'd come across your article before and read it very carefully but in there you explain how to use a custom map to "filter" data from a single table. As you can see from my reply to Quenting above, I don't even have data to filter from each table (unless I don't want every variable) as I only get one observation originally. I guess once the tables are created using the automap I could merge them all together but, I'm thinking I could just create a single map to end up with just one dataset an a single process instead of doing multiple steps.
Also, I don't know why one the server I'm developing this I cannot open the datasets resulting from automap. It creates the library just fine and fills it with the ALLDATA and other tables, but every time I try to open one I get this weird message telling me that "the origin matrix is not long enough and that I should check srcIndex, the length and inferior limits of the matrix" (it might be poorly translated from Spanish) and I want to avoid any kind of errors like this using extra steps when in production.
Thanks again for your time 😉
In practice I rarely use the JSON map feature, as I find it easier to use DATA step or SQL to combine the data after reading it in. SAS has to read/parse the entire JSON anyway, so I don't think there is much of a performance benefit either way. (But no...I have not tested.)
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.