BookmarkSubscribeRSS Feed
A_SAS_Man
Pyrite | Level 9

I am trying to use the json libname engine to parse some large ish files (example of files located here: https://transparency-in-coverage.uhc.com/undefined). The specific file I'm looking at is named this: 2022-08-01_UNITED-HEALTHCARE-SERVICES_Third-Party-Administrator_PS1-50_C2_in-network-rates.json.gz

 

I try to parse this file as below.

 

filename gen3 ZIP '/data/2022-08-01_UNITED-HEALTHCARE-SERVICES_Third-Party-Administrator_PS1-50_C2_in-network-rates.json.gz' GZIP;

libname gen3 json NOALLDATA ;

These are large files so I don't expect anyone to download them and try this out necessarily but wanted to provide context. After this I try to save off the different files parsed from the json to disk using something like the following code.

 

data ps1.IN_NETWORK;
set gen3.IN_NETWORK;
run;

data ps1.NEGOTIATED_PRICES_SERVICE_CODE;
set gen3.NEGOTIATED_PRICES_SERVICE_CODE;
run;

data ps1.PROVIDER_GROUPS_NPI;
set gen3.PROVIDER_GROUPS_NPI;
run;

The data steps run fine for all of the files parsed from the json, except for one (Provider_Groups_NPI). This file is not the largest file of any of them, although it is about 10 GB. The data step completes in about 10-15 minutes for one of the non problem files that is about 25 GB. But for the provider groups npi it gets 99% of the file written to the desired location and then freezes for about 10-12 hours, then it saves the file off. It does this repeatedly, so it's not some sort of one off processing error or slowdown.

 

Does anyone have thoughts as to what could be going on here or ways I could attempt to troubleshoot?

 

4 REPLIES 4
Tom
Super User Tom
Super User

I suspect it is just too large.  I think that the JSON object gets build into memory and it might cause issues when you hit some upper limit of size for your machine.

 

Also if the JSON file results in a lot of long character variables make sure you are using COMPRESS=YES option.  That could save a LOT of disk space when it is writing out the actual SAS dataset.  For example if you have a variable that could have 5,000 characters in it, but it is usually much shorter or empty, then the JSON text will be much smaller than the space needed in the SAS dataset.  By telling SAS to compress the records before writing them you can make the stored SAS dataset much smaller.  It is can be as small as less than 10% of the size of the uncompressed datasets.

A_SAS_Man
Pyrite | Level 9
Thanks, that was kind of my worry as well. Is there any way to do more of a streaming read/save of a json file in SAS rather than attempting the whole thing at once?
Essen560500
Obsidian | Level 7

I am also looking at Transparency in Coverage files and having some difficulty in parsing the files. Example is noted below. Do you have any leanings that I can apply how to parse the files and the delimiters ?  I am using the code below but it opens up as one column for few thousand rows. Would appreciate if have any advise ? 

 

%let jsonfile=/mnt/sasusers/&user./data/TiC/UHC_OHIO_PPO.json;

/* Read the JSON file using the JSON engine */
filename myfile "&jsonfile";


data mydata;
infile myfile lrecl=32767;
input;
json = _infile_;
run;

 

 

https://uhc-tic-mrf.azureedge.net/public-mrf/2023-12-01/2023-12-01_UnitedHealthcare-Freedom-Insuranc...

 

Tom
Super User Tom
Super User

If you have a new question start a new topic.

But you cannot read the full file using on 32k lines.  There is at least one line that is longer then that.

51   data _null_;
52     infile xx lrecl=10000000;
53     input;
54   run;

NOTE: The infile XX is:
      (system-specific pathname),
      (system-specific file attributes)

NOTE: 57368 records were read from the infile (system-specific pathname).
      The minimum record length was 0.
      The maximum record length was 44181.
NOTE: DATA statement used (Total process time):
      real time           1.48 seconds
      cpu time            1.48 seconds

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 549 views
  • 0 likes
  • 3 in conversation