I posted the steps for doing that in my first answer to this question.
But do it manually first to make sure you know what works.
@BayzidurRahman wrote:
Thanks. Got it.
Is there any way to change all the TYPE to Character and length to a fixed number by doing some code? This is because I have thousands of files and need to pass a macro through them.
For a worked example post a small example JSON file.
I have 23 types JSON file with same structure but different variables. Attached 3 types of file.
So first have SAS make a MAP file for one of the file types.
%let filename=c:\downloads\json\AmtProduct_xx.json;
filename rawmap "c:\downloads\json\Amt_Product_rawmap.json";
libname json json "&filename" map=rawmap automap=create;
Then have SAS read the map itself as a JSON file. Merge the two tables back together to make it more useful.
libname rawmap json ;
data variables;
merge rawmap.datasets rawmap.datasets_variables;
by ordinal_datasets;
run;
Result so far:
Now if you want those other three variables to be CHARACTER you need to change the TYPE value from NUMERIC to CHARACTER. And if you want a map file that will also make the variables with SAME length you need to have a LENGTH variable instead of CURRENT_LENGTH variable.
This is where you will need to make actual decisions. Which variables should be NUMERIC and which should be CHARACTER? For the CHARACTER variables how long each one needs to be.
So let's do a quick a dirty approach and just set every variable to CHARACTER and set the length to 200 for every one. While we are at at let's rename the ordinal variable to ROW.
data variables;
set variables;
if type = 'ORDINAL' then name='ROW';
else do;
type='CHARACTER';
LENGTH=200;
end;
drop current_length;
run;
Result
Now that we have the metadata we want the JSON engine to use when converting the JSON text into SAS dataset(s) we need to write that back out as a MAP file.
filename map "c:\downloads\json\Amt_Product_map.json";
data _null_;
file map ;
if _n_=1 then put '{"DATASETS":' / '[{' @ ;
if eof then put '}]'/ '}' ;
set variables end=eof ;
by ordinal_datasets;
if first.ordinal_datasets then do;
if _n_ > 1 then put ',' @;
put '"DSNAME":' dsname :$quote. ',"TABLEPATH":' tablepath :$quote.
/ ' ,"VARIABLES":'
/ ' [' @
;
end;
else put ' ,' @;
put '{"NAME":' name :$quote. ',"TYPE":' type :$quote. ',"PATH":' path :$quote. @ ;
if length then put ',"LENGTH":' length @;
put '}' ;
if last.ordinal_datasets then put ' ]' ;
run;
Here is what the new map file looks like:
{"DATASETS": [{"DSNAME":"root" ,"TABLEPATH":"/root" ,"VARIABLES": [{"NAME":"ROW" ,"TYPE":"ORDINAL" ,"PATH":"/root" } ,{"NAME":"AmtProductId" ,"TYPE":"CHARACTER" ,"PATH":"/root/AmtProductId" ,"LENGTH":200 } ,{"NAME":"CTPP_Id" ,"TYPE":"CHARACTER" ,"PATH":"/root/CTPP_Id" ,"LENGTH":200 } ,{"NAME":"TPP_Id" ,"TYPE":"CHARACTER" ,"PATH":"/root/TPP_Id" ,"LENGTH":200 } ,{"NAME":"MPP_Id" ,"TYPE":"CHARACTER" ,"PATH":"/root/MPP_Id" ,"LENGTH":200 } ] }] }
Now let's use it to read the original JSON file and check how the variables are defined and how a few of the observations look.
libname json json "&filename" map=map;
proc contents data=json.root varnum;
run;
proc print data=json.root(obs=10);
run;
So you just need to repeat that process 23 times for the 23 different types of files you have. Make sure to set the lengths long enough for any values that might exist in the other files you have not check yet.
That first file (the ARCHIVE one) is an example where the number of variables included in the JSON file varies from observation to observation.
libname json json "C:\downloads\json\ResidentDose_0a895d78-7287-49ee-922a-460f8f5c9dd2_201709_Archive.json";
proc sql ;
create table check_variables as
select p1 as NAME length=32
, count(*) as N
, max(length(value)) as LENGTH
from json.alldata
group by 1
;
quit;
proc print;
run;
The other two files are more consistent in that every observation has every variable.
Would that create a problem in the process?
The problem it mainly creates is that you will probably need to examine every file to make sure you know the full set of possible variables can occur in that particular type of file.
For example you might want to loop over all of the files and generate a complete list of the variables.
You could use something like this:
1) Get a list of all of the JSON files into a dataset, Call it FILES with a variable names FILENAME.
2) Use that file to generate code that makes a libref and finds the variable names and max length and aggregates that into a single dataset.
3) then use that dataset to generate the best estimate for the LENGTH of the character variables so you can use it to make the MAP file.
If we assume that the text before the first _ in the filename defines which of the 23 groups then you might make a macro like this:
%macro check_names(filename);
libname json "&filename";
proc sql ;
create table check_one as
select scan("&filename",-1,'/\') as filename length=200
, scan(calculated filename,1,'_') as group length=20
, p1 as NAME length=32
, count(*) as N
, max(length(value)) as LENGTH
from json.alldata
group by 1,2,3
;
quit;
libname json clear;
proc append base=check_names data=check_one force;
run;
%mend;
Then call it once for each file you have.
data _null_:
set files;
call execute(cats('%nrstr(%check_name)(',filename,')'));
run;
Then you could get the list of variables per group from that aggregated summaries.
proc sql;
create table group_variables as
select group
, name
, count(*) as n_files
, sum(n) as n
, max(length) as length
from check_names
group by 1,2
;
quit;
Thanks for your time on this.
I have now received a data dictionary from the data provider with the length of every variable (attached). I have created the lent variable as the argument in the length statement and put that variable into the a macro "lentq" by each group (named as dataset in the length1 file). Can i run a loop to pass through all the 23 files types and assign correct length to all the variables, given that I will convert all the variables to character?
proc sql noprint;
select distinct lent
into :lentq separated by ' ' NOTRIM /* Creating the macro variable lentq containing the length argument for the variables in the specific
dataset of "&group" */
from length1 (where =(dataset=:"&group")); quit;
So assuming that the lengths mentioned in that dataset are what you want to use then use it to make dataset like the one I used to make the MAP file. One of your variables has a name that it too long to use as a name in a SAS dataset. We can truncate that name to 32 bytes and store the original name in the variable LABEL instead.
data variables;
length ordinal_datasets 8 dsname $32 tablepath $100 name $32 type $9 length 8 label $100 path $100;
set 'c:\downloads\json\length1.sas7bdat';
dsname=scan(dataset,1,'_');
if dsname ne lag(dsname) then ordinal_datasets+1;
tablepath='/root';
name=target_column;
if name ne target_column then label=target_column;
path=catx('/',tablepath,target_column);
length=max_length;
type='CHARACTER';
keep ordinal_datasets -- path ;
run;
Now we can use that data to write out the different MAP files.
data _null_;
set variables end=eof ;
by ordinal_datasets;
length filename $256 ;
filename = cats('c:\downloads\json\',dsname,'_map.json');
file map filevar=filename ;
if first.ordinal_datasets then do;
put '{"DATASETS":' / '[{' @ ;
put '"DSNAME":' dsname :$quote. ',"TABLEPATH":' tablepath :$quote.
/ ' ,"VARIABLES":'
/ ' [' @
;
end;
else put ' ,' @;
put '{"NAME":' name :$quote. ',"TYPE":' type :$quote. ',"PATH":' path :$quote. @ ;
if length then put ',"LENGTH":' length @;
if not missing(label) then put ',"LABEL":' label :$quote. @;
put '}' ;
if last.ordinal_datasets then put ' ]' / '}]'/ '}' ;
run;
Results:
Which you can use with the different JSON files.
filename json "C:\downloads\json\AmtProduct_bc4882eb-510a-44ec-b237-4ef116e56fa8.json";
filename map "C:\downloads\json\AmtProduct_map.json";
libname json json map=map;
proc contents data=json.AmtProduct varnum; run;
proc print data=json.AmtProduct(obs=10);
run;
The code worked fine for an individual file.
I am running multiple macros for importing all the files in group using the following syntax. Can you please suggest where the map information should go?
libname test JSON "C:\Users\mq10004085\Macquarie University\MRFF Aged care Data - Documents\JSON_1st_02032023\&&file&i...json" ORDINALCOUNT= NONE ;*NOALLDATA;
data sas.&&ds&i ; /*Creating tables in the Work library instead of the samt lib*/
set test.Root;
dataname="&&file&i"; /*Variable containing the json file names*/
run;
You need two filenames. The name of the JSON file you are reading and the name of the MAP file you want the JSON engine to use when reading the file.
Example:
libname json json
"C:\downloads\json\AmtProduct_bc4882eb-510a-44ec-b237-4ef116e56fa8.json"
map="C:\downloads\json\AmtProduct_map.json"
;
If you use the map files I made before then you will also need to change the name of the dataset you are reading in since we named if after the name in your source metadata instead of the default named of ROOT that the automap generated MAP used.
Also if you want to add that extra variable then define the length of it before you assign a value to it so that it will have the same length every time through your macro loop.
data sas.&&ds&i ;
set test.AmtProduct;
length dataname $100 ;
dataname="&&file&i";
run;
If you want to use the same macro for all 23 different file types then you probably need to add a macro variable for the data type and the map filename. Say you added a macro variable named DSNAME that had a value like AmtProduct and you stored the map files into the same directory where you have the JSON files then the code might look like this:
libname test JSON
"C:\Users\mq10004085\Macquarie University\MRFF Aged care Data - Documents\JSON_1st_02032023\&&file&i...json"
noalldata
map = "C:\Users\mq10004085\Macquarie University\MRFF Aged care Data - Documents\&dsname._map.json"
;
data sas.&&ds&i ;
set test.&dsname;
length dataname $100 ;
dataname="&&file&i";
run;
Thanks Tom. Much appreciated.
@bayzid wrote:
Sorry for not understanding your point. This is because I have very limited knowledge about map file.
The documentation has a good explanation, as well as an example of creating a map from a json file, and then editing it.
The map has the metadata to define the SAS variables type, length, label, etc, in the same way as you would use an the informat statement, length statement, label statement, etc if you were reading it by hand.
An alternative approach would be to read the data and let some fields come in as numeric variables. Then in a separate step you could convert all the numeric fields to character (and enforce a standard length). But the map approach is cleaner. If someone was kind enough to give you a data dictionary, you could use that to build the map file.
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.