I am currently working with Transparency in Coverage Payer datasets and facing issues while parsing the file into variables within the provided JSON format. The output I am getting is a single large file with only one column. I am seeking guidance on how to properly align the output with the correct columns.
File is noted below -
Code using is noted below -
libname data "/mnt/sasusers/&user./data/TiC";
%let jsonfile=/mnt/sasusers/&user./data/TiC/UHC_OHIO_PPO.json;
/* Read the JSON file using the JSONLIB engine */
filename myfile "&jsonfile";
data mydata;
infile myfile lrecl=32767;
input;
json = _infile_;
run;
/* Parse the JSON data using the JSONLIB engine */
data parsed_data;
length key $2000 value $2000;
retain key value;
infile myfile jsonlib;
input key $ value $;
run;
/* Create a new dataset with structured output */
data formatted_data;
set parsed_data;
length element $2000;
if substr(key,1,1) ne ' ' then element = key;
else if substr(key,1,1) = ' ' and not missing(value) then output;
drop key value;
run;
On the SAS server - so accessing it using
filename myfile "/mnt/sasusers/&user./data/TiC/UHC_OHIO_PPO.json";
You should use the JSON LIBNAME engine to access the data:
/* Making the fileref the same as the intended JSON libref simplifies the code */
filename jslib "/mnt/sasusers/&user./data/TiC/UHC_OHIO_PPO.json";
libname jslib json;
/* Now you can access the data in the JSON file as SAS data sets */
proc contents data=jslib._all_;
run;
Use some external JSON parsing tool to convert that JSON text into something that you can use.
https://www.digitalocean.com/community/tutorials/how-to-transform-json-data-with-jq
Do you have documentation on what information is supposed to be in that file? It appears to have a lot of very long lists.
What information do you want out of it?
These are files that are new to the healthcare industry and I am looking to see if any info contained within it can get us any competitive info pertaining to drugs/NDC and pricing especially. Any variables that have $$ associated to it and any indicators would be needed. I will also look at the tools you are referring to in your reply. Thanks !
Are you trying to read files produced in the format described here:
https://github.com/CMSgov/price-transparency-guide
https://github.com/CMSgov/price-transparency-guide/tree/master/schemas/in-network-rates
What an ugly format.
Anyway the issue with your example file is that it is over 1 Gigabyte of text. And since JSON engine stores everything into memory reading the whole file is taking too long. I was able to easily read the example file from that site.
Looking at your particular file they did try to format the JSON file consistently. So you might be able to break it into pieces and read it. I can this little program to find where the actual data records start and end.
24 data _null_;
125 infile json lrecl=50000 length=ll truncover;
126 input @;
127 line+1;
128 len=ll;
129 input @1 ch1 :$1. @ll ch2 $1. ;
130 if (ch1 ne '09'x) or (ch2 ne ',') then put line= ch1= ch2= len= ;
131 run;
NOTE: The infile JSON is:
Filename=C:\downloads\2023-12-01_UnitedHealthcare-Freedom-Insurance-Company_Insurer_PPO---NDC_PPO-NDC_in-netw
ork-rates.json,
RECFM=V,LRECL=50000,
File Size (bytes)=1216563805,
Last Modified=17Nov2023:08:32:41,
Create Time=06Dec2023:11:49:58
line=1 ch1= ch2= len=0
line=2 ch1={ ch2={ len=1
line=3 ch1=" ch2=, len=74
line=4 ch1=" ch2=, len=39
line=5 ch1=" ch2=, len=36
line=6 ch1=" ch2=, len=23
line=7 ch1=" ch2=, len=44181
line=8 ch1=" ch2=[ len=15
line=57364 ch1= ch2=} len=22002
line=57365 ch1= ch2= len=0
line=57366 ch1=] ch2=] len=5
line=57367 ch1=} ch2=} len=1
line=57368 ch1= ch2= len=8
NOTE: 57368 records were read from the infile JSON.
The minimum record length was 0.
The maximum record length was 44181.
NOTE: DATA statement used (Total process time):
real time 1.41 seconds
cpu time 1.21 seconds
So the first 8 lines are the basic data for that entity. Then all of the detailed records start on line 9 and end on line 57,364.
So if you make a copy of the file that includes lines 1 to 8 and lines 57364 to the end then the JSON engine should be able to read it.
I can read about 100 lines in 2 seconds.
%let st=%sysfunc(datetime());
filename subset temp;
data _null_;
infile json lrecl=50000;
file subset lrecl=50000;
input;
if _n_ in (1:8,9:108,57364:57368) then put _infile_;
run;
libname subset json ;
proc copy inlib=subset out=work;
run;
%put Elapsed time = %sysfunc(putn(%sysfunc(datetime())-&st,time12.4));
But to read about 1000 records takes over 42 seconds.
So reading 10 times more records takes 20 times as much time. That is why trying to read the whole file at once takes so long (or even does not finish).
So you might need to break it into a LOT of smaller files to be able to read it.
I unzipped your file to c:/temp on my laptop. I then ran this code in PC SAS 9.4M8:
filename js "c:/temp/2023-12-01_UnitedHealthcare-of-Ohio--Inc-_Insurer_PPO---NDC_PPO-NDC_in-network-rates.json";
libname js json;
libname perm "c:/temp/mydata";
proc copy in=js out=perm;
run;
It copied all of the data out to SAS data sets in under 4 minutes:
7 proc copy in=js out=perm; NOTE: Writing HTML Body file: sashtml1.htm 8 run; NOTE: Copying JS.ALLDATA to PERM.ALLDATA (memtype=DATA). NOTE: BUFSIZE is not cloned when copying across different engines. System Option for BUFSIZE was used. NOTE: There were 58669647 observations read from the data set JS.ALLDATA. NOTE: The data set PERM.ALLDATA has 58669647 observations and 8 variables. NOTE: Copying JS.IN_NETWORK to PERM.IN_NETWORK (memtype=DATA). NOTE: BUFSIZE is not cloned when copying across different engines. System Option for BUFSIZE was used. NOTE: There were 57356 observations read from the data set JS.IN_NETWORK. NOTE: The data set PERM.IN_NETWORK has 57356 observations and 8 variables. NOTE: Copying JS.NEGOTIATED_PRICES_BILLING_CODE to PERM.NEGOTIATED_PRICES_BILLING_CODE (memtype=DATA). NOTE: BUFSIZE is not cloned when copying across different engines. System Option for BUFSIZE was used. NOTE: There were 4855430 observations read from the data set JS.NEGOTIATED_PRICES_BILLING_CODE. NOTE: The data set PERM.NEGOTIATED_PRICES_BILLING_CODE has 4855430 observations and 2 variables. NOTE: Copying JS.NEGOTIATED_PRICES_SERVICE_CODE to PERM.NEGOTIATED_PRICES_SERVICE_CODE (memtype=DATA). NOTE: BUFSIZE is not cloned when copying across different engines. System Option for BUFSIZE was used. NOTE: There were 4855430 observations read from the data set JS.NEGOTIATED_PRICES_SERVICE_CODE. NOTE: The data set PERM.NEGOTIATED_PRICES_SERVICE_CODE has 4855430 observations and 3 variables. NOTE: Copying JS.NEGOTIATED_RATES_NEGOTIATED_PR to PERM.NEGOTIATED_RATES_NEGOTIATED_PR (memtype=DATA). NOTE: BUFSIZE is not cloned when copying across different engines. System Option for BUFSIZE was used. NOTE: There were 4855430 observations read from the data set JS.NEGOTIATED_RATES_NEGOTIATED_PR. NOTE: The data set PERM.NEGOTIATED_RATES_NEGOTIATED_PR has 4855430 observations and 7 variables. NOTE: Copying JS.NEGOTIATED_RATES_PROVIDER_REFE to PERM.NEGOTIATED_RATES_PROVIDER_REFE (memtype=DATA). NOTE: BUFSIZE is not cloned when copying across different engines. System Option for BUFSIZE was used. NOTE: There were 4855272 observations read from the data set JS.NEGOTIATED_RATES_PROVIDER_REFE. NOTE: The data set PERM.NEGOTIATED_RATES_PROVIDER_REFE has 4855272 observations and 3 variables. NOTE: Copying JS.PROVIDER_GROUPS_NPI to PERM.PROVIDER_GROUPS_NPI (memtype=DATA). NOTE: BUFSIZE is not cloned when copying across different engines. System Option for BUFSIZE was used. NOTE: There were 363 observations read from the data set JS.PROVIDER_GROUPS_NPI. NOTE: The data set PERM.PROVIDER_GROUPS_NPI has 363 observations and 483 variables. NOTE: Copying JS.PROVIDER_GROUPS_TIN to PERM.PROVIDER_GROUPS_TIN (memtype=DATA). NOTE: BUFSIZE is not cloned when copying across different engines. System Option for BUFSIZE was used. NOTE: There were 363 observations read from the data set JS.PROVIDER_GROUPS_TIN. NOTE: The data set PERM.PROVIDER_GROUPS_TIN has 363 observations and 4 variables. NOTE: Copying JS.PROVIDER_REFERENCES to PERM.PROVIDER_REFERENCES (memtype=DATA). NOTE: BUFSIZE is not cloned when copying across different engines. System Option for BUFSIZE was used. NOTE: There were 150 observations read from the data set JS.PROVIDER_REFERENCES. NOTE: The data set PERM.PROVIDER_REFERENCES has 150 observations and 3 variables. NOTE: Copying JS.ROOT to PERM.ROOT (memtype=DATA). NOTE: BUFSIZE is not cloned when copying across different engines. System Option for BUFSIZE was used. NOTE: There were 1 observations read from the data set JS.ROOT. NOTE: The data set PERM.ROOT has 1 observations and 5 variables. NOTE: PROCEDURE COPY used (Total process time): real time 3:43.21 cpu time 2:00.01
My laptop is beefy, but not cutting edge: 4 cores/8 threads with 32GB of RAM. What are the system specs where you are running your code?
Which version of the code are you using?
Where is your JSON file stored in relation to your SAS Compute server? Are you accessing it using a file system FILENAME or via FILENAME URL?
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.