BookmarkSubscribeRSS Feed
A_SAS_Man
Pyrite | Level 9

I am trying to use the json libname engine to parse some large ish files (example of files located here: https://transparency-in-coverage.uhc.com/undefined). The specific file I'm looking at is named this: 2022-08-01_UNITED-HEALTHCARE-SERVICES_Third-Party-Administrator_PS1-50_C2_in-network-rates.json.gz

 

I try to parse this file as below.

 

filename gen3 ZIP '/data/2022-08-01_UNITED-HEALTHCARE-SERVICES_Third-Party-Administrator_PS1-50_C2_in-network-rates.json.gz' GZIP;

libname gen3 json NOALLDATA ;

These are large files so I don't expect anyone to download them and try this out necessarily but wanted to provide context. After this I try to save off the different files parsed from the json to disk using something like the following code.

 

data ps1.IN_NETWORK;
set gen3.IN_NETWORK;
run;

data ps1.NEGOTIATED_PRICES_SERVICE_CODE;
set gen3.NEGOTIATED_PRICES_SERVICE_CODE;
run;

data ps1.PROVIDER_GROUPS_NPI;
set gen3.PROVIDER_GROUPS_NPI;
run;

The data steps run fine for all of the files parsed from the json, except for one (Provider_Groups_NPI). This file is not the largest file of any of them, although it is about 10 GB. The data step completes in about 10-15 minutes for one of the non problem files that is about 25 GB. But for the provider groups npi it gets 99% of the file written to the desired location and then freezes for about 10-12 hours, then it saves the file off. It does this repeatedly, so it's not some sort of one off processing error or slowdown.

 

Does anyone have thoughts as to what could be going on here or ways I could attempt to troubleshoot?

 

4 REPLIES 4
Tom
Super User Tom
Super User

I suspect it is just too large.  I think that the JSON object gets build into memory and it might cause issues when you hit some upper limit of size for your machine.

 

Also if the JSON file results in a lot of long character variables make sure you are using COMPRESS=YES option.  That could save a LOT of disk space when it is writing out the actual SAS dataset.  For example if you have a variable that could have 5,000 characters in it, but it is usually much shorter or empty, then the JSON text will be much smaller than the space needed in the SAS dataset.  By telling SAS to compress the records before writing them you can make the stored SAS dataset much smaller.  It is can be as small as less than 10% of the size of the uncompressed datasets.

A_SAS_Man
Pyrite | Level 9
Thanks, that was kind of my worry as well. Is there any way to do more of a streaming read/save of a json file in SAS rather than attempting the whole thing at once?
Essen560500
Obsidian | Level 7

I am also looking at Transparency in Coverage files and having some difficulty in parsing the files. Example is noted below. Do you have any leanings that I can apply how to parse the files and the delimiters ?  I am using the code below but it opens up as one column for few thousand rows. Would appreciate if have any advise ? 

 

%let jsonfile=/mnt/sasusers/&user./data/TiC/UHC_OHIO_PPO.json;

/* Read the JSON file using the JSON engine */
filename myfile "&jsonfile";


data mydata;
infile myfile lrecl=32767;
input;
json = _infile_;
run;

 

 

https://uhc-tic-mrf.azureedge.net/public-mrf/2023-12-01/2023-12-01_UnitedHealthcare-Freedom-Insuranc...

 

Tom
Super User Tom
Super User

If you have a new question start a new topic.

But you cannot read the full file using on 32k lines.  There is at least one line that is longer then that.

51   data _null_;
52     infile xx lrecl=10000000;
53     input;
54   run;

NOTE: The infile XX is:
      (system-specific pathname),
      (system-specific file attributes)

NOTE: 57368 records were read from the infile (system-specific pathname).
      The minimum record length was 0.
      The maximum record length was 44181.
NOTE: DATA statement used (Total process time):
      real time           1.48 seconds
      cpu time            1.48 seconds

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 625 views
  • 0 likes
  • 3 in conversation