BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
bayzid
Obsidian | Level 7

I need to read the files in 23 different batches are they are different. For example, among the 84,000 JSON files 1500 are of TypeA (they are named as TypeA_*) and I need to append them into a single file named TypeA_All. I have 200 JSON files of TypeB (very differnt from other types) and need to append them inot TypeB_All.

Reeza
Super User
Are the files differentiated by name, by being stored in different folders? If you can tell which files belong to which group by name, you can add that information/logic into the FILELIST data set, then it can process them in groups.
bayzid
Obsidian | Level 7

All the files are in a single folder and can be differentiated by name. Each file name starts with their group name followed by some index separated by underscore.

Reeza
Super User

You can figure out how to add that group to the filelist table, untested example below. 

 

filename dirtree url 
    'https://raw.githubusercontent.com/sasutils/macros/master/dirtree.sas';
%include dirtree /source2;
%dirtree(C:\Users\bayzid\Macquarie University\MRFF Aged care Data - Documents\json_test    /* Pipe delimited directory list (default=.) */
        , out=filelist  /* Output dataset name */
        , maxdepth=1 /* Maximum tree depth */);

%macro read_json(myfile=, group = );
    filename mydata &myfile.;

    libname myjson JSON fileref=mydata nrm;;

    proc datasets lib=myjson;
    quit;

    *how to parse the file has not been specified - looks like you need to specify the lengths;

    data file2save;
        length source $200. p1 $256. Value $256.;
        set myjson.alldata;
        source=&myfile.;
    run;

    proc append base=master_&group data=file2save force;
    run;

    proc sql;
        drop table file2save;
    quit;

    filename mydata;
    libname myjson;
%mend read_json;

proc sql;
    drop table master;
quit;

data run_list;
    set filelist;
    where type='F';

   *add other logic to add group variable, this is just an example;
    group=scan(filename, 1, "_");
   
*test this is created correctly;
    str=catt('%read_json(myfile="C:\Users\bayzid\Macquarie University\MRFF Aged care Data - Documents\json_test\', filename, '", group=', group,  ');');
  *once tests are complete uncomment the execution to test it;  
 *call execute(str);
run;
bayzid
Obsidian | Level 7

Thanks very much Reeza. This worked!
I get the following warning.

WARNING: File WORK.MASTER.DATA does not exist.
WARNING: Table WORK.MASTER has not been dropped

Which is due to the

proc sql;
    drop table master;
quit;

Is that proc sql necessary? I tried replacing the master with master_&group but didn't work.

If I need to create some new variables in each of the master_&group datasets then where should I put the required statements?

Reeza
Super User
It removes the table so that each run is clean and you're not appending to old tables. Otherwise you need to delete the tables being appended to each run so you don't have data being carried forward from incorrect runs.

Depends on what variables and where they come from and how you want them saved.
bayzid
Obsidian | Level 7

Yes- I understand that the PROC SQL removes old tables but it doesn't seem to work after adding the &group component to your code because the "Master" dataset in the old command has now become "Master_&group". May need to modify the code.
I need to create some new variables from the "Source" variable and replace some components of the "var" variable in the "Master_&group" datasets.

bayzid
Obsidian | Level 7

If we cannot import the row numbers from each JSON files then it wouldn't be possible to reshape the data into wide format.

Reeza
Super User

@bayzid wrote:

If we cannot import the row numbers from each JSON files then it wouldn't be possible to reshape the data into wide format.


What row numbers? Why can't you import them? 

Tom
Super User Tom
Super User

Actually the first picture is showing what is probably the problem. 

 

The JSON file is using \ as an ESCAPE character to "escape" the embedded double quote character. 

"Diagnosis":"\"Severe,.....\""

SAS expects you to instead use the more natural method of doubling the embedded quote characters.

"Diagnosis":"""Severe,....."""

You might be able to get around that in the READING of the line by modifying the line before parsing it.

input @;
_infile_=tranwrd(_infile_,'\"','""');
BayzidurRahman
Obsidian | Level 7

Hi Tom,
Thanks for the code. It worked for everything else but for the following one.
"Diagnosis":"TIA's. L.V.F (left ventricular failure) Rapid A.F. Hypertension (high BP), Asthma, SOB requiring 02, Urinary tract infection, fractured hip (right) approx 12 yrs ago, Pain in lower spine, gastro oesophageal reflux disorder, depression, anxiety. \\",

It is pushing the bold words under the name column.

Tom
Super User Tom
Super User

That is because the extra \ is escaping the \ before the ".

 

So the trick I suggested for modifying the _INFILE_ variable needs to made smarter.  

 

You could probably do it with a regular expression, but that is generally too confusing for me.  I would probably instead add some extra logic to "hide" the escaped \ characters first, then convert the escaped quotes, and then restore the escaped \ as just a regular \ character.

_infile_=translate(tranwrd(tranwrd(_infile_,'\\','00'x),'\"','""'),'\','00'x);

 Example:

options parmcards=json ;
filename json temp;
parmcards4;
"Diagnosis":"TIA's. L.V.F (left ventricular failure) Rapid A.F. Hypertension (high BP), Asthma, SOB requiring 02, Urinary tract infection, fractured hip (right) approx 12 yrs ago, Pain in lower spine, gastro oesophageal reflux disorder, depression, anxiety. \\", 
;;;;
29   data test;
30     infile json;
31     input @;
32     put 'BEFORE: ' _infile_;
33     _infile_=translate(tranwrd(tranwrd(_infile_,'\\','00'x),'\"','""'),'\','00'x);
34     put 'AFTER : ' _infile_;
35   run;

NOTE: The infile JSON is:
      (system-specific pathname),
      (system-specific file attributes)

BEFORE:
"Diagnosis":"TIA's. L.V.F (left ventricular failure) Rapid A.F. Hypertension (high BP), Asthma, SOB requiring 02, Urinary tract infe
ction, fractured hip (right) approx 12 yrs ago, Pain in lower spine, gastro oesophageal reflux disorder, depression, anxiety. \\",
AFTER :
"Diagnosis":"TIA's. L.V.F (left ventricular failure) Rapid A.F. Hypertension (high BP), Asthma, SOB requiring 02, Urinary tract infe
ction, fractured hip (right) approx 12 yrs ago, Pain in lower spine, gastro oesophageal reflux disorder, depression, anxiety. \",
NOTE: 1 record was read from the infile (system-specific pathname).
      The minimum record length was 262.
      The maximum record length was 262.
NOTE: The data set WORK.TEST has 1 observations and 0 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.01 seconds
bayzid
Obsidian | Level 7

It runs painfully slow. It has been over an hour (although it normally takes couple of minutes) and is stuck somewhere.

bayzid_0-1682376662153.png

 

Tom
Super User Tom
Super User

That method is not going to work for just any JSON file.  It would only work for very simple ones like your drug information files where everything is just a simple NAME/VALUE pair.

 

Put it shouldn't cause any errors in the step that reads the file.  I suspect that you might be having permissions issues accessing one of the files.  Definitely more likely if you are trying to read thousands of files at once.

bayzid
Obsidian | Level 7

All of my JSON files are of same structure. The Resident data looks like

bayzid_1-1682026835888.png

One of the variables is very long. Is that causing the issue? Is it possible to keep the length open so that it can take as long length as needed?

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 56 replies
  • 2106 views
  • 7 likes
  • 6 in conversation