DATA Step, Macro, Functions and more

Parsing JSON stored under a Data set Column

Reply
New Contributor
Posts: 2

Parsing JSON stored under a Data set Column

Hi All,

 

I need to work with JSON data that is stored as a JSONB data type from the source POSTGRES table.

 

Below snapshot of the dataset details the JSON data under the 'data' column. 

company_id	data
6066	        {'gw': {'date': '2015-12-31', 'currency': 'USD', 'value': 32021}, 'do': {'date': '2015-12-31', 'currency': 'USD', 'value': 71}}

Desired output is to transform the dataset to the below form.

company_id	metric	date	        currency	value
6066	        gw	2015-12-31	USD	        32021
6066	        do	2015-12-31	USD	        71

 

The contents under the JSON string are pretty long extending to 1000's of metrics, and the dataset is large that it makes converting back to JSON and parsing it is cumbersome.

 

a. With the JSON data that needs parsing in a dataset, is there an option of still using INFILE and/or LIBNAME JSON feasible?

b. Tried parsing the string with CALL SCAN position parameters, but, unsure how to deal with multiple metric data rollover to separate rows of data.

 

Any pointers on the approach to follow would be very helpful. Thank you!

 

 

Trusted Advisor
Posts: 1,018

Re: Parsing JSON stored under a Data set Column

Posted in reply to sarav1981

In your example, how would the JSON parser know the second variable is named METRIC?  

 

Also, are you saying that your parser needs to look at quoted content to see whether it is a date value?

New Contributor
Posts: 2

Re: Parsing JSON stored under a Data set Column

Thank you for checking mkeintz.

Naming the parsed value columns is not critical at the first pass. Named the METRIC column, so as to show it as a field to be parsed (1st position within the JSON).
Same goes for the DATE field stored as a string. Identifying that the 2nd position within JSON is a date stored as string, I can then apply the relevant format.
Trusted Advisor
Posts: 1,018

Re: Parsing JSON stored under a Data set Column

Posted in reply to sarav1981

Apparently there is now a JSON engine for the libname statement.  See: Reading JSON

PROC Star
Posts: 1,759

Re: Parsing JSON stored under a Data set Column

[ Edited ]
Posted in reply to sarav1981

Like this?


data HAVE;
  COMPANY_ID= '6066 ';                                    
  DATA= "{'gw': {'date': '2015-12-31', 'currency': 'USD',"; output;
  DATA= "'value': 32021}, 'do': {'date': '2015-12-31',   "; output;
  DATA= " 'currency': 'USD', 'value': 71}}               "; output;
  COMPANY_ID= '6067 ';                                    
  DATA= "{'gw': {'date': '2015-12-31', 'currency': 'NZD',"; output;
  DATA= "'value': 32022}, 'do': {'date': '2015-12-31',   "; output;
  DATA= " 'currency': 'NZD', 'value': 72}}               "; output;
run;

data WANT;
  retain COMPANY_ID METRIC DATE CURRENCY VALUE PRX;
  length METRIC DATE CURRENCY VALUE WORD1 WORD2 $16;
  keep COMPANY_ID METRIC DATE CURRENCY VALUE;
  if _N_=1 then PRX=prxparse("/'.*?':( '.*?')?/");
  STARTPOS=1;
  set HAVE;
  do until (POS=0);
    call prxnext(PRX, STARTPOS, -1, DATA, POS, LEN); 
    if POS then do;
      WORD1= scan(substr(DATA,POS),1,"{}': ") ;
      WORD2= scan(substr(DATA,POS),2,"{}': ") ;
      select(WORD1);
        when( 'date'     ) DATE    =WORD2;
        when( 'currency' ) CURRENCY=WORD2;
        when( 'value'    ) VALUE   =WORD2;
        otherwise          METRIC  =WORD1;
      end;
    end;
    if VALUE ne '' then do;
      output;
      call missing( METRIC, DATE, CURRENCY, VALUE );
    end;
  end;
run;
proc print data=WANT noobs;
run; 
COMPANY_ID METRIC DATE CURRENCY VALUE
6066 gw 2015-12-31 USD 32021
6066 do 2015-12-31 USD 71
6067 gw 2015-12-31 NZD 32022
6067 do 2015-12-31 NZD 72

 

Super User
Posts: 10,023

Re: Parsing JSON stored under a Data set Column

Posted in reply to sarav1981

data x;
company_id=6066;
data="{'gw': {'date': '2015-12-31', 'currency': 'USD', 'value': 32021}, 'do': {'date': '2015-12-31', 'currency': 'USD', 'value': 71}}";
output;
run;
data temp;
 set x;
 do i=1 to countw(data,'{}[],:');
  temp=dequote(strip(scan(data,i,'{}[],:')));output;
 end;
drop data;
run;

data temp1;
 merge temp temp(firstobs=2 keep=temp company_id 
 rename=(temp=_temp company_id=_id));
 if (missing(_temp) and company_id=_id) or 
 temp in ('date' 'currency' 'value');
 drop _id;
run;
data temp2;
 set temp1;
 length metric $ 40;
 retain metric;
 if missing(_temp) then do;metric=temp;delete;end;
 drop i;
run;
proc transpose data=temp2 out=want;
by company_id metric notsorted;
var _temp;
id temp;
run;


PROC Star
Posts: 1,759

Re: Parsing JSON stored under a Data set Column

[ Edited ]

@Ksharp If there is too much data for exporting to a flat file, I suspect creating 3 intermediate tables is not an option either.

Here is a way -similar to yours, where we create a suite of values in one column- to do it in one step (I don't see why the view should create a spill file, but I don't know. Is there a way to know?)

data HAVE;
  COMPANY_ID= '6066 ';                                    
  DATA= "{'gw': {'date': '2015-12-31', 'currency': 'USD',"; output;
  DATA= "'value': 32021}, 'do': {'date': '2015-12-31',   "; output;
  DATA= " 'currency': 'USD', 'value': 71}}               "; output;
  COMPANY_ID= '6067 ';                                    
  DATA= "{'gw': {'date': '2015-12-31', 'currency': 'NZD',"; output;
  DATA= "'value': 32022}, 'do': {'date': '2015-12-31',   "; output;
  DATA= " 'currency': 'NZD', 'value': 72}}               "; output;
run;

data _V/view=_V;
  set HAVE;
  length STR $16;
  keep COMPANY_ID STR;
  do I=1 to 1e9 until (STR=' ');
    STR=scan(DATA, I, "{}':, ") ;
    if STR ne ' ' then output;
  end;
run;

data WANT; 
  retain COMPANY_ID METRIC DATE CURRENCY ;
  keep   COMPANY_ID METRIC DATE CURRENCY VALUE;
  set _V;
  if ^SKIP_METRIC then METRIC  =STR;
  select(lag(STR));
    when( 'date'     ) DATE    =STR;
    when( 'currency' ) CURRENCY=STR;
    when( 'value'    ) VALUE   =STR;
    otherwise         ;
  end;                   
  if VALUE ne ' ' then do;
    output;
    SKIP_METRIC=0;
  end;
  else SKIP_METRIC+1;
run;

 

COMPANY_ID METRIC DATE CURRENCY VALUE
6066 gw 2015-12-31 USD 32021
6066 do 2015-12-31 USD 71
6067 gw 2015-12-31 NZD 32022
6067 do 2015-12-31 NZD 72

 

 

 

 

Super User
Posts: 10,023

Re: Parsing JSON stored under a Data set Column

Chris,

I don't know either.

OP said the whole json string is in a variable named DATA, not like your data step did .

PROC Star
Posts: 1,759

Re: Parsing JSON stored under a Data set Column

@Ksharp I understood differently: " unsure how to deal with multiple metric data rollover to separate rows of data."

The code works either way. Smiley Happy

Ask a Question
Discussion stats
  • 8 replies
  • 159 views
  • 0 likes
  • 4 in conversation