Re: Parsing JSON stored under a Data set Column

sarav1981 · Posted 02-20-2017 06:27 PM

Hi All,

I need to work with JSON data that is stored as a JSONB data type from the source POSTGRES table.

Below snapshot of the dataset details the JSON data under the 'data' column.

company_id	data
6066	        {'gw': {'date': '2015-12-31', 'currency': 'USD', 'value': 32021}, 'do': {'date': '2015-12-31', 'currency': 'USD', 'value': 71}}

Desired output is to transform the dataset to the below form.

company_id	metric	date	        currency	value
6066	        gw	2015-12-31	USD	        32021
6066	        do	2015-12-31	USD	        71

The contents under the JSON string are pretty long extending to 1000's of metrics, and the dataset is large that it makes converting back to JSON and parsing it is cumbersome.

a. With the JSON data that needs parsing in a dataset, is there an option of still using INFILE and/or LIBNAME JSON feasible?

b. Tried parsing the string with CALL SCAN position parameters, but, unsure how to deal with multiple metric data rollover to separate rows of data.

Any pointers on the approach to follow would be very helpful. Thank you!

mkeintz · Posted 02-20-2017 08:41 PM

In your example, how would the JSON parser know the second variable is named METRIC?

Also, are you saying that your parser needs to look at quoted content to see whether it is a date value?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sarav1981 · Posted 02-20-2017 08:49 PM

Thank you for checking mkeintz.

Naming the parsed value columns is not critical at the first pass. Named the METRIC column, so as to show it as a field to be parsed (1st position within the JSON).
Same goes for the DATE field stored as a string. Identifying that the 2nd position within JSON is a date stored as string, I can then apply the relevant format.

mkeintz · Posted 02-20-2017 08:53 PM

Apparently there is now a JSON engine for the libname statement. See: Reading JSON

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

ChrisNZ · Posted 02-20-2017 08:49 PM

Like this?


data HAVE;
  COMPANY_ID= '6066 ';                                    
  DATA= "{'gw': {'date': '2015-12-31', 'currency': 'USD',"; output;
  DATA= "'value': 32021}, 'do': {'date': '2015-12-31',   "; output;
  DATA= " 'currency': 'USD', 'value': 71}}               "; output;
  COMPANY_ID= '6067 ';                                    
  DATA= "{'gw': {'date': '2015-12-31', 'currency': 'NZD',"; output;
  DATA= "'value': 32022}, 'do': {'date': '2015-12-31',   "; output;
  DATA= " 'currency': 'NZD', 'value': 72}}               "; output;
run;

data WANT;
  retain COMPANY_ID METRIC DATE CURRENCY VALUE PRX;
  length METRIC DATE CURRENCY VALUE WORD1 WORD2 $16;
  keep COMPANY_ID METRIC DATE CURRENCY VALUE;
  if _N_=1 then PRX=prxparse("/'.*?':( '.*?')?/");
  STARTPOS=1;
  set HAVE;
  do until (POS=0);
    call prxnext(PRX, STARTPOS, -1, DATA, POS, LEN); 
    if POS then do;
      WORD1= scan(substr(DATA,POS),1,"{}': ") ;
      WORD2= scan(substr(DATA,POS),2,"{}': ") ;
      select(WORD1);
        when( 'date'     ) DATE    =WORD2;
        when( 'currency' ) CURRENCY=WORD2;
        when( 'value'    ) VALUE   =WORD2;
        otherwise          METRIC  =WORD1;
      end;
    end;
    if VALUE ne '' then do;
      output;
      call missing( METRIC, DATE, CURRENCY, VALUE );
    end;
  end;
run;
proc print data=WANT noobs;
run;

COMPANY_ID	METRIC	DATE	CURRENCY	VALUE
6066	gw	2015-12-31	USD	32021
6066	do	2015-12-31	USD	71
6067	gw	2015-12-31	NZD	32022
6067	do	2015-12-31	NZD	72

High-Performance SAS Coding - Third Edition

Ksharp · Posted 02-20-2017 10:24 PM


data x;
company_id=6066;
data="{'gw': {'date': '2015-12-31', 'currency': 'USD', 'value': 32021}, 'do': {'date': '2015-12-31', 'currency': 'USD', 'value': 71}}";
output;
run;
data temp;
 set x;
 do i=1 to countw(data,'{}[],:');
  temp=dequote(strip(scan(data,i,'{}[],:')));output;
 end;
drop data;
run;

data temp1;
 merge temp temp(firstobs=2 keep=temp company_id 
 rename=(temp=_temp company_id=_id));
 if (missing(_temp) and company_id=_id) or 
 temp in ('date' 'currency' 'value');
 drop _id;
run;
data temp2;
 set temp1;
 length metric $ 40;
 retain metric;
 if missing(_temp) then do;metric=temp;delete;end;
 drop i;
run;
proc transpose data=temp2 out=want;
by company_id metric notsorted;
var _temp;
id temp;
run;

ChrisNZ · Posted 02-20-2017 10:54 PM

@Ksharp If there is too much data for exporting to a flat file, I suspect creating 3 intermediate tables is not an option either.

Here is a way -similar to yours, where we create a suite of values in one column- to do it in one step (I don't see why the view should create a spill file, but I don't know. Is there a way to know?)

data HAVE;
  COMPANY_ID= '6066 ';                                    
  DATA= "{'gw': {'date': '2015-12-31', 'currency': 'USD',"; output;
  DATA= "'value': 32021}, 'do': {'date': '2015-12-31',   "; output;
  DATA= " 'currency': 'USD', 'value': 71}}               "; output;
  COMPANY_ID= '6067 ';                                    
  DATA= "{'gw': {'date': '2015-12-31', 'currency': 'NZD',"; output;
  DATA= "'value': 32022}, 'do': {'date': '2015-12-31',   "; output;
  DATA= " 'currency': 'NZD', 'value': 72}}               "; output;
run;

data _V/view=_V;
  set HAVE;
  length STR $16;
  keep COMPANY_ID STR;
  do I=1 to 1e9 until (STR=' ');
    STR=scan(DATA, I, "{}':, ") ;
    if STR ne ' ' then output;
  end;
run;

data WANT; 
  retain COMPANY_ID METRIC DATE CURRENCY ;
  keep   COMPANY_ID METRIC DATE CURRENCY VALUE;
  set _V;
  if ^SKIP_METRIC then METRIC  =STR;
  select(lag(STR));
    when( 'date'     ) DATE    =STR;
    when( 'currency' ) CURRENCY=STR;
    when( 'value'    ) VALUE   =STR;
    otherwise         ;
  end;                   
  if VALUE ne ' ' then do;
    output;
    SKIP_METRIC=0;
  end;
  else SKIP_METRIC+1;
run;

COMPANY_ID	METRIC	DATE	CURRENCY	VALUE
6066	gw	2015-12-31	USD	32021
6066	do	2015-12-31	USD	71
6067	gw	2015-12-31	NZD	32022
6067	do	2015-12-31	NZD	72

High-Performance SAS Coding - Third Edition

Ksharp · Posted 02-20-2017 11:29 PM

Chris,

I don't know either.

OP said the whole json string is in a variable named DATA, not like your data step did .

ChrisNZ · Posted 02-20-2017 11:48 PM

@Ksharp I understood differently: " unsure how to deal with multiple metric data rollover to separate rows of data."

The code works either way. 🙂

High-Performance SAS Coding - Third Edition

vinaybatra · Posted 07-11-2019 01:05 AM

Hi Sarav,

I have a similar requirement where I need to extract json data (hundreds of columns; big string) from a column in a table. Json is stored in a column in a table. Can you please share how you achieved it?

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away