BookmarkSubscribeRSS Feed
Tom
Super User Tom
Super User

I posted the steps for doing that in my first answer to this question. 

 

But do it manually first to make sure you know what works.

Tom
Super User Tom
Super User

@BayzidurRahman wrote:
Thanks. Got it.
Is there any way to change all the TYPE to Character and length to a fixed number by doing some code? This is because I have thousands of files and need to pass a macro through them.

For a worked example post a small example JSON file.

bayzid
Obsidian | Level 7

I have 23 types JSON file with same structure but different variables. Attached 3 types of file.

Tom
Super User Tom
Super User

So first have SAS make a MAP file for one of the file types.

%let filename=c:\downloads\json\AmtProduct_xx.json;
filename rawmap "c:\downloads\json\Amt_Product_rawmap.json";
libname json json "&filename" map=rawmap automap=create;

Then have SAS read the map itself as a JSON file.  Merge the two tables back together to make it more useful.

libname rawmap json ;
data variables;
  merge rawmap.datasets rawmap.datasets_variables;
  by ordinal_datasets;
run;

Result so far:

Tom_0-1682796427978.png

Now if you want those other three variables to be CHARACTER you need to change the TYPE value from NUMERIC to CHARACTER.   And if you want a map file that will also make the variables with SAME length you need to have a LENGTH variable instead of CURRENT_LENGTH variable.  

 

This is where you will need to make actual decisions.  Which variables should be NUMERIC and which should be CHARACTER?  For the CHARACTER variables how long each one needs to be.  

 

So let's do a quick a dirty approach and just set every variable to CHARACTER and set the length to 200 for every one.  While we are at at let's rename the ordinal variable to ROW.

data variables;
  set variables;
  if type = 'ORDINAL' then name='ROW';
  else do;
    type='CHARACTER';
    LENGTH=200;
  end;
  drop current_length;
run;

Result

Tom_1-1682796907785.png

Now that we have the metadata we want the JSON engine to use when converting the JSON text into SAS dataset(s) we need to write that back out as a MAP file.

filename map "c:\downloads\json\Amt_Product_map.json";
data _null_;
  file map ;
  if _n_=1 then put '{"DATASETS":' / '[{' @ ;
  if eof then put '}]'/ '}' ;
  set variables end=eof ;
  by ordinal_datasets;
  if first.ordinal_datasets then do;
    if _n_ > 1 then put ',' @;
    put '"DSNAME":' dsname :$quote. ',"TABLEPATH":' tablepath :$quote.
     /  ' ,"VARIABLES":'
     /  '  [' @
    ;
  end;
  else put '  ,' @;
  put '{"NAME":' name :$quote. ',"TYPE":' type :$quote. ',"PATH":' path :$quote. @ ;
  if length then put ',"LENGTH":' length @;
  put '}' ;
  if last.ordinal_datasets then put '  ]' ;
run;

Here is what the new map file looks like:

{"DATASETS":
[{"DSNAME":"root" ,"TABLEPATH":"/root"
 ,"VARIABLES":
  [{"NAME":"ROW" ,"TYPE":"ORDINAL" ,"PATH":"/root" }
  ,{"NAME":"AmtProductId" ,"TYPE":"CHARACTER" ,"PATH":"/root/AmtProductId" ,"LENGTH":200 }
  ,{"NAME":"CTPP_Id" ,"TYPE":"CHARACTER" ,"PATH":"/root/CTPP_Id" ,"LENGTH":200 }
  ,{"NAME":"TPP_Id" ,"TYPE":"CHARACTER" ,"PATH":"/root/TPP_Id" ,"LENGTH":200 }
  ,{"NAME":"MPP_Id" ,"TYPE":"CHARACTER" ,"PATH":"/root/MPP_Id" ,"LENGTH":200 }
  ]
}]
}

Now let's use it to read the original JSON file and check how the variables are defined and how a few of the observations look.

libname json json "&filename" map=map;
proc contents data=json.root varnum;
run;
proc print data=json.root(obs=10);
run;

Tom_2-1682797321042.png

So you just need to repeat that process 23 times for the 23 different types of files you have.  Make sure to set the lengths long enough for any values that might exist in the other files you have not check yet.

 

Tom
Super User Tom
Super User

That first file (the ARCHIVE one) is an example where the number of variables included in the JSON file varies from observation to observation.

libname json json "C:\downloads\json\ResidentDose_0a895d78-7287-49ee-922a-460f8f5c9dd2_201709_Archive.json";
proc sql ;
  create table check_variables as 
    select p1 as NAME length=32
         , count(*) as N
         , max(length(value)) as LENGTH
    from json.alldata
    group by 1
  ;
quit;

proc print;
run;

Tom_0-1682803569633.png

 

The other two files are more consistent in that every observation has every variable.

Tom_0-1682803777353.pngTom_1-1682803830307.png

 

 

bayzid
Obsidian | Level 7

Would that create a problem in the process?

Tom
Super User Tom
Super User

The problem it mainly creates is that you will probably need to examine every file to make sure you know the full set of possible variables can occur in that particular type of file.

 

For example you might want to loop over all of the files and generate a complete list of the variables.

You could use something like this:

1) Get a list of all of the JSON files into a dataset, Call it FILES with a variable names FILENAME.

2) Use that file to generate code that makes a libref and finds the variable names and max length and aggregates that into a single dataset.

3) then use that dataset to generate the best estimate for the LENGTH of the character variables so you can use it to make the MAP file.

 

If we assume that the text before the first _ in the filename defines which of the 23 groups then you might make a macro like this:

%macro check_names(filename);
libname json "&filename";
proc sql ;
create table check_one as
  select scan("&filename",-1,'/\') as filename length=200
       , scan(calculated filename,1,'_') as group length=20 
       , p1 as NAME length=32
       , count(*) as N
       , max(length(value)) as LENGTH
  from json.alldata
  group by 1,2,3
;
quit;
libname json clear;
proc append base=check_names data=check_one force;
run;
%mend;

Then call it once for each file you have.

data _null_:
  set files;
  call execute(cats('%nrstr(%check_name)(',filename,')'));
run;

Then you could get the list of variables per group from that aggregated summaries.

proc sql;
create table group_variables as
select group
     , name
     , count(*) as n_files
     , sum(n) as n
     , max(length) as length
 from check_names
group by 1,2
;
quit;
BayzidurRahman
Obsidian | Level 7

Thanks for your time on this.
I have now received a data dictionary from the data provider with the length of every variable (attached). I have created the lent variable as the argument in the length statement and put that variable into the a macro "lentq" by each group (named as dataset in the length1 file). Can i run a loop to pass through all the 23 files types and assign correct length to all the variables, given that I will convert all the variables to character?

proc sql noprint;
   select distinct lent
      into :lentq separated by ' ' NOTRIM /* Creating the macro variable lentq containing the length argument for the variables in the specific
	                                        dataset of "&group" */
      from length1 (where =(dataset=:"&group")); quit;
Tom
Super User Tom
Super User

So assuming that the lengths mentioned in that dataset are what you want to use then use it to make dataset like the one I used to make the MAP file.  One of your variables has a name that it too long to use as a name in a SAS dataset.  We can truncate that name to 32 bytes and store the original name in the variable LABEL instead.

data variables;
  length ordinal_datasets 8 dsname $32 tablepath $100 name $32 type $9 length 8 label $100 path $100;
  set 'c:\downloads\json\length1.sas7bdat';
  dsname=scan(dataset,1,'_');
  if dsname ne lag(dsname) then ordinal_datasets+1;
  tablepath='/root';
  name=target_column;
  if name ne target_column then label=target_column;
  path=catx('/',tablepath,target_column);
  length=max_length;
  type='CHARACTER';
  keep ordinal_datasets -- path ;
run;

Tom_1-1682806929179.png

 

Tom_0-1682806824532.png

Now we can use that data to write out the different MAP files.

data _null_;
  set variables end=eof ;
  by ordinal_datasets;
  length filename $256 ;
  filename = cats('c:\downloads\json\',dsname,'_map.json');
  file map filevar=filename ;
  if first.ordinal_datasets then do;
    put '{"DATASETS":' / '[{' @ ;    
    put '"DSNAME":' dsname :$quote. ',"TABLEPATH":' tablepath :$quote.
     /  ' ,"VARIABLES":'
     /  '  [' @
    ;
  end;
  else put '  ,' @;
  put '{"NAME":' name :$quote. ',"TYPE":' type :$quote. ',"PATH":' path :$quote. @ ;
  if length then put ',"LENGTH":' length @;
  if not missing(label) then put ',"LABEL":' label :$quote. @;
  put '}' ;
  if last.ordinal_datasets then put '  ]' / '}]'/ '}' ;
run;

Results:

Tom_2-1682807314988.png

Tom_3-1682807467600.png

 

Which you can use with the different JSON files.

filename json "C:\downloads\json\AmtProduct_bc4882eb-510a-44ec-b237-4ef116e56fa8.json";
filename map "C:\downloads\json\AmtProduct_map.json";
libname json json map=map;
proc contents data=json.AmtProduct varnum; run;
proc print data=json.AmtProduct(obs=10);
run;

Tom_0-1682807876223.png

 

 

 

bayzid
Obsidian | Level 7

The code worked fine for an individual file.
I am running multiple macros for importing all the files in group using the following syntax. Can you please suggest where the map information should go?

 libname test JSON "C:\Users\mq10004085\Macquarie University\MRFF Aged care Data - Documents\JSON_1st_02032023\&&file&i...json" ORDINALCOUNT= NONE ;*NOALLDATA; 
data sas.&&ds&i ; /*Creating tables in the Work library instead of the samt lib*/
set test.Root;
dataname="&&file&i"; /*Variable containing the json file names*/
run;  
Tom
Super User Tom
Super User

You need two filenames.  The name of the JSON file you are reading and the name of the MAP file you want the JSON engine to use when reading the file.

Example:

libname json json
  "C:\downloads\json\AmtProduct_bc4882eb-510a-44ec-b237-4ef116e56fa8.json"
  map="C:\downloads\json\AmtProduct_map.json"
;

If you use the map files I made before then you will also need to change the name of the dataset you are reading in since we named if after the name in your source metadata instead of the default named of ROOT that the automap generated MAP used.

Also if you want to add that extra variable then define the length of it before you assign a value to it so that it will have the same length every time through your macro loop.

data sas.&&ds&i ;
  set test.AmtProduct;
  length dataname $100 ;
  dataname="&&file&i"; 
run;

If you want to use the same macro for all 23 different file types then you probably need to add a macro variable for the data type and the map filename.  Say you added a macro variable named DSNAME that had a value like AmtProduct and you stored the map files into the same directory where you have the JSON files then the code might look like this:

libname test JSON 
 "C:\Users\mq10004085\Macquarie University\MRFF Aged care Data - Documents\JSON_1st_02032023\&&file&i...json" 
  noalldata
  map =  "C:\Users\mq10004085\Macquarie University\MRFF Aged care Data - Documents\&dsname._map.json"
;
data sas.&&ds&i ;
  set test.&dsname;
  length dataname $100 ;
  dataname="&&file&i"; 
run;
bayzid
Obsidian | Level 7

Thanks Tom. Much appreciated.

Quentin
Super User

@bayzid wrote:

Sorry for not understanding your point. This is because I have very limited knowledge about map file.


The documentation has a good explanation, as well as an example of creating a map from a json file, and then editing it.  

https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/lestmtsglobal/n1jfdetszx99ban1rl4zll6tej7j.ht...

 

The map has the metadata to define the SAS variables type, length, label, etc, in the same way as you would use an the informat statement, length statement, label statement, etc if you were reading it by hand.

An alternative approach would be to read the data and let some fields come in as numeric variables.  Then in a separate step you could convert all the numeric fields to character (and enforce a standard length).  But the map approach is cleaner.  If someone was kind enough to give you a data dictionary, you could use that to build the map file.

 

BASUG is hosting free webinars Next up: Don Henderson presenting on using hash functions (not hash tables!) to segment data on June 12. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 27 replies
  • 1449 views
  • 4 likes
  • 5 in conversation