I need to read the files in 23 different batches are they are different. For example, among the 84,000 JSON files 1500 are of TypeA (they are named as TypeA_*) and I need to append them into a single file named TypeA_All. I have 200 JSON files of TypeB (very differnt from other types) and need to append them inot TypeB_All.
All the files are in a single folder and can be differentiated by name. Each file name starts with their group name followed by some index separated by underscore.
You can figure out how to add that group to the filelist table, untested example below.
filename dirtree url
'https://raw.githubusercontent.com/sasutils/macros/master/dirtree.sas';
%include dirtree /source2;
%dirtree(C:\Users\bayzid\Macquarie University\MRFF Aged care Data - Documents\json_test /* Pipe delimited directory list (default=.) */
, out=filelist /* Output dataset name */
, maxdepth=1 /* Maximum tree depth */);
%macro read_json(myfile=, group = );
filename mydata &myfile.;
libname myjson JSON fileref=mydata nrm;;
proc datasets lib=myjson;
quit;
*how to parse the file has not been specified - looks like you need to specify the lengths;
data file2save;
length source $200. p1 $256. Value $256.;
set myjson.alldata;
source=&myfile.;
run;
proc append base=master_&group data=file2save force;
run;
proc sql;
drop table file2save;
quit;
filename mydata;
libname myjson;
%mend read_json;
proc sql;
drop table master;
quit;
data run_list;
set filelist;
where type='F';
*add other logic to add group variable, this is just an example;
group=scan(filename, 1, "_");
*test this is created correctly;
str=catt('%read_json(myfile="C:\Users\bayzid\Macquarie University\MRFF Aged care Data - Documents\json_test\', filename, '", group=', group, ');');
*once tests are complete uncomment the execution to test it;
*call execute(str);
run;
Thanks very much Reeza. This worked!
I get the following warning.
WARNING: File WORK.MASTER.DATA does not exist.
WARNING: Table WORK.MASTER has not been dropped
Which is due to the
proc sql;
drop table master;
quit;
Is that proc sql necessary? I tried replacing the master with master_&group but didn't work.
If I need to create some new variables in each of the master_&group datasets then where should I put the required statements?
Yes- I understand that the PROC SQL removes old tables but it doesn't seem to work after adding the &group component to your code because the "Master" dataset in the old command has now become "Master_&group". May need to modify the code.
I need to create some new variables from the "Source" variable and replace some components of the "var" variable in the "Master_&group" datasets.
If we cannot import the row numbers from each JSON files then it wouldn't be possible to reshape the data into wide format.
@bayzid wrote:
If we cannot import the row numbers from each JSON files then it wouldn't be possible to reshape the data into wide format.
What row numbers? Why can't you import them?
Actually the first picture is showing what is probably the problem.
The JSON file is using \ as an ESCAPE character to "escape" the embedded double quote character.
"Diagnosis":"\"Severe,.....\""
SAS expects you to instead use the more natural method of doubling the embedded quote characters.
"Diagnosis":"""Severe,....."""
You might be able to get around that in the READING of the line by modifying the line before parsing it.
input @;
_infile_=tranwrd(_infile_,'\"','""');
Hi Tom,
Thanks for the code. It worked for everything else but for the following one.
"Diagnosis":"TIA's. L.V.F (left ventricular failure) Rapid A.F. Hypertension (high BP), Asthma, SOB requiring 02, Urinary tract infection, fractured hip (right) approx 12 yrs ago, Pain in lower spine, gastro oesophageal reflux disorder, depression, anxiety. \\",
It is pushing the bold words under the name column.
That is because the extra \ is escaping the \ before the ".
So the trick I suggested for modifying the _INFILE_ variable needs to made smarter.
You could probably do it with a regular expression, but that is generally too confusing for me. I would probably instead add some extra logic to "hide" the escaped \ characters first, then convert the escaped quotes, and then restore the escaped \ as just a regular \ character.
_infile_=translate(tranwrd(tranwrd(_infile_,'\\','00'x),'\"','""'),'\','00'x);
Example:
options parmcards=json ;
filename json temp;
parmcards4;
"Diagnosis":"TIA's. L.V.F (left ventricular failure) Rapid A.F. Hypertension (high BP), Asthma, SOB requiring 02, Urinary tract infection, fractured hip (right) approx 12 yrs ago, Pain in lower spine, gastro oesophageal reflux disorder, depression, anxiety. \\",
;;;;
29 data test; 30 infile json; 31 input @; 32 put 'BEFORE: ' _infile_; 33 _infile_=translate(tranwrd(tranwrd(_infile_,'\\','00'x),'\"','""'),'\','00'x); 34 put 'AFTER : ' _infile_; 35 run; NOTE: The infile JSON is: (system-specific pathname), (system-specific file attributes) BEFORE: "Diagnosis":"TIA's. L.V.F (left ventricular failure) Rapid A.F. Hypertension (high BP), Asthma, SOB requiring 02, Urinary tract infe ction, fractured hip (right) approx 12 yrs ago, Pain in lower spine, gastro oesophageal reflux disorder, depression, anxiety. \\", AFTER : "Diagnosis":"TIA's. L.V.F (left ventricular failure) Rapid A.F. Hypertension (high BP), Asthma, SOB requiring 02, Urinary tract infe ction, fractured hip (right) approx 12 yrs ago, Pain in lower spine, gastro oesophageal reflux disorder, depression, anxiety. \", NOTE: 1 record was read from the infile (system-specific pathname). The minimum record length was 262. The maximum record length was 262. NOTE: The data set WORK.TEST has 1 observations and 0 variables. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.01 seconds
It runs painfully slow. It has been over an hour (although it normally takes couple of minutes) and is stuck somewhere.
That method is not going to work for just any JSON file. It would only work for very simple ones like your drug information files where everything is just a simple NAME/VALUE pair.
Put it shouldn't cause any errors in the step that reads the file. I suspect that you might be having permissions issues accessing one of the files. Definitely more likely if you are trying to read thousands of files at once.
All of my JSON files are of same structure. The Resident data looks like
One of the variables is very long. Is that causing the issue? Is it possible to keep the length open so that it can take as long length as needed?
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.