Hi, I'm working on a conference paper on copying data between SAS and JSON. For reading JSON files into SAS, I started with the examples in the LIBNAME Statement: JSON Engine documentation and I'm using SAS 9.4TS1M4.
What I've seen about reading JSON files into SAS generally involves looking at the data and then writing code specific to the data layout. I wonder if there's a way to automate the process.
For example, the section on Merging Data Sets shows how to read JSON code into multiple SAS data sets and then use ordered values (variables beginning with ORDINAL_) to merge the data sets into a single data set. Using that JSON file to save space here, could I automate that? Maybe after the PROC DATASETS in the documentation, I could do something like the following:
1. Read the dictionary.columns table for the library containing the component data sets and subset to obtain all variables that start with ORDINAL_.
proc sql noprint;
create table ordinals as
select memname, name
from dictionary.columns
where libname="IN"
and upcase(name) like 'ORDINAL_%';
quit;
Resulting data set:
Obs memname name
1 INFO ordinal_root
2 INFO ordinal_info
3 INFO_ADD ordinal_info
4 INFO_ADD ordinal_add
5 INFO_NAME ordinal_info
6 INFO_NAME ordinal_name
7 ROOT ordinal_root
2. Remove observations for ORDINAL_ values that only occur once since those are at the bottom of the hierarchy.
proc sort data=ordinals nouniquekey;
by name;
run;
Resulting data set:
Resulting data set:
Obs memname name
1 INFO ordinal_info
2 INFO_ADD ordinal_info
3 INFO_NAME ordinal_info
4 INFO ordinal_root
5 ROOT ordinal_root
3. Construct the merge by working down the levels from ORDINAL_ROOT.
a. Merge 1 is for data sets with ORDINAL_ROOT (INFO and ROOT).
b. Since INFO from Merge 1 also has ORDINAL_INFO, Merge 2 is the data set created in Merge 1 with INFO_ADD and INFO_NAME (they both have ORDINAL_INFO).
c. Additional levels could work the same way.
That worked with a few JSON files but then I got to a few fairly simple cases that didn't. For example, consider the following JSON file (simplified to remove additional blocks of rows like these).
[
{
"country": "usa",
"city": "chicago",
"income": {
"salary": {
"salary_h1": 35,
"salary_h2": 45
},
"stocks": {
"stocks_h1": 8,
"stocks_h2": 12
}
},
"tax": {
"incometax": {
"incometax_estimated": 3,
"incometax_withheld": 7
},
"salestax": 5,
"cartax": {
"cartax_h1": 2,
"cartax_h2": 2
}
}
}
]
Using PROC SQL and PROC SORT as above, this generated the following result, where there's no connection between ORDINAL_INCOME and the rest of the tree.
Obs memname name
1 INCOME_SALARY ordinal_income
2 INCOME_STOCKS ordinal_income
3 ROOT ordinal_root
4 TAX ordinal_root
5 TAX ordinal_tax
6 TAX_CARTAX ordinal_tax
7 TAX_INCOMETAX ordinal_tax
So I hit a dead end with this effort, unless there's a way to identify some subset of JSON files that'll work this way.
Has anyone done in any research in this area and come up with anything interesting?
Thanks,
Bruce
INCOME has no data, so no dataset is made. TAX has this data
"salestax": 5
You might want to just use the table ALLDATA.
Tom:
Thanks. Revealing my ignorance about the concepts of JSON, I'm confused. If I do the 2 merges as described above, I get this.
incometax_ incometax_ cartax_ cartax_ Obs country city salestax estimated withheld h1 h2 1 usa chicago 5 3 7 2 2
I see in the JSON file that INCOME has no data directly underneath it, as you noted. But SALARY and STOCKS, which do have values, are nested below INCOME. Conceptually, the idea that INCOMETAX and CARTAX seem analogous to SALARY and STOCKS, yet I can access the values nested below TAX because there's a key/value pair "salestax": 5 directly below TAX but no data directly below INCOME is something I'm having trouble wrapping my head around.
Thanks,
Bruce
It looks to me like INCOME is not generated but if it were it would have just the variables ORDINAL_ROOT and ORDINAL_INDEX. For your simple case it looks like you could generate it from the ROOT table by just setting ORDINAL_INDEX to match ORDINAL_ROOT. But that might not work with more complex data.
Add ORDINAL_COUNT=ALL to the LIBNAME statement.
ORDINALCOUNT= ALL | NONE | n
specifies the maximum number of ordinal variables to generate for each data set. The maximum number of possible ordinal variables for a data set depends on the position of the data in the JSON, and can be less than the number that you specify. In that case, only the maximum number of possible ordinal variables is generated. ORDINALCOUNT can have one of these values:
ALL
creates all possible ordinal variables for the data set.
NONE
suppresses creation of ordinal variables for the data set.
n
is an integer that is greater than or equal to 0 (zero).
Default 2
Example results :
ordinal_ ordinal_ ordinal_ salary_ salary_
Obs root income salary h1 h2
1 1 1 1 35 45
2 3 2 2 35 45
Tom:
Hi, I added ORDINAL_COUNT=ALL to the LIBNAME statement as suggested. Thanks!!!
Now, initial ORDINAL_ variables are as follows. *** = not present without ORDINALCOUNT= ALL.
memname name
INCOME_SALARY ordinal_root ***
INCOME_SALARY ordinal_income
INCOME_SALARY ordinal_salary
INCOME_STOCKS ordinal_root ***
INCOME_STOCKS ordinal_income
INCOME_STOCKS ordinal_stocks
ROOT ordinal_root
TAX ordinal_root
TAX ordinal_tax
TAX_CARTAX ordinal_root ***
TAX_CARTAX ordinal_tax
TAX_CARTAX ordinal_cartax
TAX_INCOMETAX ordinal_root ***
TAX_INCOMETAX ordinal_tax
TAX_INCOMETAX ordinal_incometax
And after getting rid of unique ordinals. *** = not present without ORDINALCOUNT= ALL.
memname name INCOME_SALARY ordinal_income INCOME_STOCKS ordinal_income INCOME_SALARY ordinal_root *** INCOME_STOCKS ordinal_root *** ROOT ordinal_root TAX ordinal_root TAX_CARTAX ordinal_root *** TAX_INCOMETAX ordinal_root *** TAX ordinal_tax TAX_CARTAX ordinal_tax TAX_INCOMETAX ordinal_tax
Now, the merge can work from this info. I haven't coded it, but I think for the first merge, I want to merge by ORDINAL_ROOT for all data sets where MEMNAME either has no underscore or the value of MEMNAME up to the first or only underscore doesn't equal an entire MEMNAME. That's probably clearest by example. The first merge is of ROOT, INCOME_SALARY, INCOME_STOCKS, and TAX, but not TAX_CARTAX or TAX_INCOMETAX as follows.
memname name Include in first merge? INCOME_SALARY ordinal_root Yes because there's no MEMNAME of INCOME INCOME_STOCKS ordinal_root Yes because there's no MEMNAME of INCOME ROOT ordinal_root Yes because it's root TAX ordinal_root Yes because there's no underscore TAX_CARTAX ordinal_root No because there's a MEMNAME of TAX TAX_INCOMETAX ordinal_root No because there's a MEMNAME of TAX
So the merges are as follows.
data finaldata6; merge xxx.root xxx.tax xxx.income_salary xxx.income_stocks; by ordinal_root; run; data finaldata6; merge finaldata6 xxx.tax_incometax xxx.tax_cartax; by ordinal_tax; drop ordinal_:; run;
And now I get all the values in the final data set! Apologies for the layout.
i n i c n o c m o e m t e a t x a _ x s s s s e _ c c s a a t t s w a a c a l l o o t i r r o l a a c c i t t t u e r r k k m h a a n c s y y s s a h x x O t i t _ _ _ _ t e _ _ b r t a h h h h e l h h s y y x 1 2 1 2 d d 1 2 1 usa chicago 5 35 45 8 12 3 7 2 2
I have no idea how much this can be generalized, but will think about this next week. Any thoughts you have are welcome. Thanks so much for moving me this far forward today!
Bruce
Tom:
Hi, I'm back at this after for the holidays and some other work. With a fresh deep dive in, I realized something significant.
Returning to the JSON file I included in the initial post 2 weeks ago. I ran the following code (with ORDINACOUNT=ALL as you suggested):
libname in5 json '/my/home/m1xxx00/json/fivefive.json' map='user5.map' automap=create ordinalcount=all; libname xxx '/my/home/m1xxx00/json/example6'; proc copy in=in5 out=xxx; run;
The library referenced by XXX had the following data sets (ignoring ALLDATA):
income_salary, income_stocks, root, tax_cartax, tax_incometax, tax
All 6 data sets had the variable ORDINAL_ROOT and now a simple merge of all these data sets generated the correct data. None of the steps I described previously were needed.
data simpledata6; merge xxx.root xxx.income_salary xxx.income_stocks xxx.tax xxx.tax_incometax xxx.tax_cartax; by ordinal_root; drop ordinal_:; run;
Data set SIMPLEDATA6 had all the values we wanted:
i n i c n o c m o e m t e a t x a _ x s s s s e _ c c a a t t s s w a a c l l o o a t i r r o a a c c l i t t t u r r k k e m h a a n c y y s s s a h x x O t i _ _ _ _ t t e _ _ b r t h h h h a e l h h s y y 1 2 1 2 x d d 1 2 1 usa chicago 35 45 8 12 5 3 7 2 2
I then tried a slightly different data set with blocks of records like the following and the same thing was true - they all had ORDINAL_ROOT, with appropriate values for each block, and the simple merge worked.
[ { "country": "usa", "city": "chicago", "income": { "salary": { "salary_h1": 35, "salary_h2": 45 }, "bonus": 10, "stocks": { "stocks_h1": 8, "stocks_h2": 12 } }, "tax": { "incometax": { "incometax_estimated": 3, "incometax_withheld": 7 }, "salestax": 5, "cartax": { "cartax_h1": 2, "cartax_h2": 2 } }, ...... (additional blocks like these)....... ]
So, I'm wondering about a few things.
1. Why did the documentation use the indirect method that it did?
2. How generalized is this solution?
I'm not sure how much time you have for this but it seems to me that a simplified way to read JSON files would be useful to many people.
Thanks,
Bruce
If just merging all of the tables at once on a single KEY variable worked for your data then you are very lucky.
Someone else just posted today a question on JSON where that would not work.
Each JSON file is different.
That is the beauty and frustration of the format.
Hi Tom, I checked out that other thread, at https://communities.sas.com/t5/SAS-Programming/SAS-EG-how-to-read-in-JSON-file/m-p/710217, and after using your code (with a minor change to the IF EOF code) to convert his JSONL code to JSON, I was able to use the same code as above to do a simple merge of all 3 data sets and get the desired result.
I know you are correct that each file is different, but I aspire to understand enough about the nuances of JSON files to have a feel for what types of files will resolve directly.
filename jsonl "/my/home/m1xxx00/json/example10/ten.jsonl"; filename json "/my/home//m1xxx00/json/example10/ten.json"; data _null_; infile jsonl end=eof; file json; if _n_=1 then put '[' @ ; else put ',' @ ; input; put _infile_; if eof then do; put ']'; stop; /* otherwise I got an extra , at the end */ end; run; libname in10 json '/my/home/m1xxx00/json/example10/ten.json' map='user10.map' automap=create ordinalcount=all; libname xxx '/my/home/m1xxx00/json/example10'; proc copy in=in10 out=xxx; run; /* Data sets ALLDATA, QUANTITY, ROOT, and VALUE in the library. */ proc datasets lib=xxx; run;quit; /* Simple merge of all data sets */ data simpledata10; merge xxx.root xxx.quantity xxx.value; by ordinal_root; /* drop ordinal_:; */ /* Comment out to test */ run; proc print data=simpledata10; run;
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.