BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Essen560500
Obsidian | Level 7

I am currently working with Transparency in Coverage Payer datasets and facing issues while parsing the file into variables within the provided JSON format. The output I am getting is a single large file with only one column. I am seeking guidance on how to properly align the output with the correct columns.

 

File is noted below - 

https://uhc-tic-mrf.azureedge.net/public-mrf/2023-12-01/2023-12-01_UnitedHealthcare-of-Ohio--Inc-_In...

 

Code using is noted below - 


libname data "/mnt/sasusers/&user./data/TiC";

%let jsonfile=/mnt/sasusers/&user./data/TiC/UHC_OHIO_PPO.json;

/* Read the JSON file using the JSONLIB engine */

filename myfile "&jsonfile";

data mydata;
infile myfile lrecl=32767;
input;
json = _infile_;
run;

/* Parse the JSON data using the JSONLIB engine */
data parsed_data;
length key $2000 value $2000;
retain key value;
infile myfile jsonlib;
input key $ value $;
run;

/* Create a new dataset with structured output */
data formatted_data;
set parsed_data;
length element $2000;
if substr(key,1,1) ne ' ' then element = key;
else if substr(key,1,1) = ' ' and not missing(value) then output;
drop key value;
run;

1 ACCEPTED SOLUTION

Accepted Solutions
Essen560500
Obsidian | Level 7

On the SAS server - so accessing it using 
filename myfile "/mnt/sasusers/&user./data/TiC/UHC_OHIO_PPO.json";

View solution in original post

29 REPLIES 29
SASJedi
SAS Super FREQ

You should use the JSON LIBNAME engine to access the data:

/* Making the fileref the same as the intended JSON libref simplifies the code */
filename jslib "/mnt/sasusers/&user./data/TiC/UHC_OHIO_PPO.json";
libname jslib json;

/* Now you can access the data in the JSON file as SAS data sets */
proc contents data=jslib._all_;
run;

Check out my Jedi SAS Tricks for SAS Users
Essen560500
Obsidian | Level 7
I tried running this code and it ran for over 2 hours till I had to terminate. Any other recommendation would appreciate it ?
Tom
Super User Tom
Super User

Use some external JSON parsing tool to convert that JSON text into something that you can use.

https://www.digitalocean.com/community/tutorials/how-to-transform-json-data-with-jq

 

Do you have documentation on what information is supposed to be in that file?  It appears to have a lot of very long lists.  

What information do you want out of it?

Essen560500
Obsidian | Level 7

These are files that are new to the healthcare industry and I am looking to see if any info contained within it can get us any competitive info pertaining to drugs/NDC and pricing  especially. Any variables that have $$ associated to it and any indicators would be needed.  I will also look at the tools you are referring to in your reply. Thanks ! 

Tom
Super User Tom
Super User

What an ugly format.

 

Anyway the issue with your example file is that it is over 1 Gigabyte of text.  And since JSON engine stores everything into memory reading the whole file is taking too long.  I was able to easily read the example file from that site.

 

Looking at your particular file they did try to format the JSON file consistently.  So you might be able to break it into pieces and read it.   I can this little program to find where the actual data records start and end.

24  data _null_;
125    infile json lrecl=50000 length=ll truncover;
126    input @;
127    line+1;
128    len=ll;
129    input @1 ch1 :$1. @ll ch2 $1. ;
130    if (ch1 ne '09'x) or (ch2 ne ',') then put line= ch1= ch2= len= ;
131  run;

NOTE: The infile JSON is:

      Filename=C:\downloads\2023-12-01_UnitedHealthcare-Freedom-Insurance-Company_Insurer_PPO---NDC_PPO-NDC_in-netw
      ork-rates.json,
      RECFM=V,LRECL=50000,
      File Size (bytes)=1216563805,
      Last Modified=17Nov2023:08:32:41,
      Create Time=06Dec2023:11:49:58

line=1 ch1=  ch2=  len=0
line=2 ch1={ ch2={ len=1
line=3 ch1=" ch2=, len=74
line=4 ch1=" ch2=, len=39
line=5 ch1=" ch2=, len=36
line=6 ch1=" ch2=, len=23
line=7 ch1=" ch2=, len=44181
line=8 ch1=" ch2=[ len=15
line=57364 ch1=	 ch2=} len=22002
line=57365 ch1=  ch2=  len=0
line=57366 ch1=] ch2=] len=5
line=57367 ch1=} ch2=} len=1
line=57368 ch1=  ch2=  len=8
NOTE: 57368 records were read from the infile JSON.
      The minimum record length was 0.
      The maximum record length was 44181.
NOTE: DATA statement used (Total process time):
      real time           1.41 seconds
      cpu time            1.21 seconds

So the first 8 lines are the basic data for that entity. Then all of the detailed records start on line 9 and end on line 57,364.

So if you make a copy of the file that includes lines 1 to 8 and lines 57364 to the end then the JSON engine should be able to read it.  

I can read about 100 lines in 2 seconds.

%let st=%sysfunc(datetime());
filename subset temp;
data _null_;
  infile json lrecl=50000;
  file subset lrecl=50000;
  input;
  if _n_ in (1:8,9:108,57364:57368) then put _infile_;
run;
libname subset json  ;
proc copy inlib=subset out=work;
run;
%put Elapsed time = %sysfunc(putn(%sysfunc(datetime())-&st,time12.4));

But to read about 1000 records takes over 42 seconds.

So reading 10 times more records takes 20 times as much time.  That is why trying to read the whole file at once takes so long (or even does not finish).

 

So you might need to break it into a LOT of smaller files to be able to read it.

SASJedi
SAS Super FREQ

I unzipped your file to c:/temp on my laptop. I then ran this code in PC SAS 9.4M8:

filename js "c:/temp/2023-12-01_UnitedHealthcare-of-Ohio--Inc-_Insurer_PPO---NDC_PPO-NDC_in-network-rates.json";
libname js json;
libname perm "c:/temp/mydata";
proc copy in=js out=perm;
run;

It copied all of the data out to SAS data sets in under 4 minutes:

7    proc copy in=js out=perm;
NOTE: Writing HTML Body file: sashtml1.htm
8    run;

NOTE: Copying JS.ALLDATA to PERM.ALLDATA (memtype=DATA).
NOTE: BUFSIZE is not cloned when copying across different engines. System Option for BUFSIZE was used.
NOTE: There were 58669647 observations read from the data set JS.ALLDATA.
NOTE: The data set PERM.ALLDATA has 58669647 observations and 8 variables.
NOTE: Copying JS.IN_NETWORK to PERM.IN_NETWORK (memtype=DATA).
NOTE: BUFSIZE is not cloned when copying across different engines. System Option for BUFSIZE was used.
NOTE: There were 57356 observations read from the data set JS.IN_NETWORK.
NOTE: The data set PERM.IN_NETWORK has 57356 observations and 8 variables.
NOTE: Copying JS.NEGOTIATED_PRICES_BILLING_CODE to PERM.NEGOTIATED_PRICES_BILLING_CODE (memtype=DATA).
NOTE: BUFSIZE is not cloned when copying across different engines. System Option for BUFSIZE was used.
NOTE: There were 4855430 observations read from the data set JS.NEGOTIATED_PRICES_BILLING_CODE.
NOTE: The data set PERM.NEGOTIATED_PRICES_BILLING_CODE has 4855430 observations and 2 variables.
NOTE: Copying JS.NEGOTIATED_PRICES_SERVICE_CODE to PERM.NEGOTIATED_PRICES_SERVICE_CODE (memtype=DATA).
NOTE: BUFSIZE is not cloned when copying across different engines. System Option for BUFSIZE was used.
NOTE: There were 4855430 observations read from the data set JS.NEGOTIATED_PRICES_SERVICE_CODE.
NOTE: The data set PERM.NEGOTIATED_PRICES_SERVICE_CODE has 4855430 observations and 3 variables.
NOTE: Copying JS.NEGOTIATED_RATES_NEGOTIATED_PR to PERM.NEGOTIATED_RATES_NEGOTIATED_PR (memtype=DATA).
NOTE: BUFSIZE is not cloned when copying across different engines. System Option for BUFSIZE was used.
NOTE: There were 4855430 observations read from the data set JS.NEGOTIATED_RATES_NEGOTIATED_PR.
NOTE: The data set PERM.NEGOTIATED_RATES_NEGOTIATED_PR has 4855430 observations and 7 variables.
NOTE: Copying JS.NEGOTIATED_RATES_PROVIDER_REFE to PERM.NEGOTIATED_RATES_PROVIDER_REFE (memtype=DATA).
NOTE: BUFSIZE is not cloned when copying across different engines. System Option for BUFSIZE was used.
NOTE: There were 4855272 observations read from the data set JS.NEGOTIATED_RATES_PROVIDER_REFE.
NOTE: The data set PERM.NEGOTIATED_RATES_PROVIDER_REFE has 4855272 observations and 3 variables.
NOTE: Copying JS.PROVIDER_GROUPS_NPI to PERM.PROVIDER_GROUPS_NPI (memtype=DATA).
NOTE: BUFSIZE is not cloned when copying across different engines. System Option for BUFSIZE was used.
NOTE: There were 363 observations read from the data set JS.PROVIDER_GROUPS_NPI.
NOTE: The data set PERM.PROVIDER_GROUPS_NPI has 363 observations and 483 variables.
NOTE: Copying JS.PROVIDER_GROUPS_TIN to PERM.PROVIDER_GROUPS_TIN (memtype=DATA).
NOTE: BUFSIZE is not cloned when copying across different engines. System Option for BUFSIZE was used.
NOTE: There were 363 observations read from the data set JS.PROVIDER_GROUPS_TIN.
NOTE: The data set PERM.PROVIDER_GROUPS_TIN has 363 observations and 4 variables.
NOTE: Copying JS.PROVIDER_REFERENCES to PERM.PROVIDER_REFERENCES (memtype=DATA).
NOTE: BUFSIZE is not cloned when copying across different engines. System Option for BUFSIZE was used.
NOTE: There were 150 observations read from the data set JS.PROVIDER_REFERENCES.
NOTE: The data set PERM.PROVIDER_REFERENCES has 150 observations and 3 variables.
NOTE: Copying JS.ROOT to PERM.ROOT (memtype=DATA).
NOTE: BUFSIZE is not cloned when copying across different engines. System Option for BUFSIZE was used.
NOTE: There were 1 observations read from the data set JS.ROOT.
NOTE: The data set PERM.ROOT has 1 observations and 5 variables.
NOTE: PROCEDURE COPY used (Total process time):
      real time           3:43.21
      cpu time            2:00.01

My laptop is beefy, but not cutting edge: 4 cores/8 threads with 32GB of RAM. What are the system specs where you are running your code?

 

Check out my Jedi SAS Tricks for SAS Users
Essen560500
Obsidian | Level 7
Device name ULCLS-JS4C6D3
Full device name ULCLS-JS4C6D3.abbvienet.com
Processor 11th Gen Intel(R) Core(TM) i5-1145G7 @ 2.60GHz 1.50 GHz
Installed RAM 16.0 GB (15.7 GB usable)
Device ID 544C6368-185F-46AA-826F-AF20259885E5
Product ID 00331-10000-00001-AA331
System type 64-bit operating system, x64-based processor
Pen and touch Touch support with 10 touch points
Essen560500
Obsidian | Level 7
I didn't get to this earlier in the week. I kicked off the code 30 mins ago. Still running 😞
SASJedi
SAS Super FREQ

Which version of the code are you using?

Check out my Jedi SAS Tricks for SAS Users
Essen560500
Obsidian | Level 7
The one that you provided above.

filename js "c:/temp/2023-12-01_UnitedHealthcare-of-Ohio--Inc-_Insurer_PPO---NDC_PPO-NDC_in-network-rates.json";
libname js json;
libname perm "c:/temp/mydata";
proc copy in=js out=perm;
run;

7 proc copy in=js out=perm;
NOTE: Writing HTML Body file: sashtml1.htm
8 run;
Essen560500
Obsidian | Level 7
Also I am doing it on SAS Studio. Not sure if that is adding to the misery
SASJedi
SAS Super FREQ

Where is your JSON file stored in relation to your SAS Compute server? Are you accessing it using a file system FILENAME or via FILENAME  URL?

Check out my Jedi SAS Tricks for SAS Users

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 29 replies
  • 1906 views
  • 13 likes
  • 4 in conversation