BookmarkSubscribeRSS Feed
sarav1981
Calcite | Level 5

Hi All,

 

I need to work with JSON data that is stored as a JSONB data type from the source POSTGRES table.

 

Below snapshot of the dataset details the JSON data under the 'data' column. 

company_id	data
6066	        {'gw': {'date': '2015-12-31', 'currency': 'USD', 'value': 32021}, 'do': {'date': '2015-12-31', 'currency': 'USD', 'value': 71}}

Desired output is to transform the dataset to the below form.

company_id	metric	date	        currency	value
6066	        gw	2015-12-31	USD	        32021
6066	        do	2015-12-31	USD	        71

 

The contents under the JSON string are pretty long extending to 1000's of metrics, and the dataset is large that it makes converting back to JSON and parsing it is cumbersome.

 

a. With the JSON data that needs parsing in a dataset, is there an option of still using INFILE and/or LIBNAME JSON feasible?

b. Tried parsing the string with CALL SCAN position parameters, but, unsure how to deal with multiple metric data rollover to separate rows of data.

 

Any pointers on the approach to follow would be very helpful. Thank you!

 

 

9 REPLIES 9
mkeintz
PROC Star

In your example, how would the JSON parser know the second variable is named METRIC?  

 

Also, are you saying that your parser needs to look at quoted content to see whether it is a date value?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
sarav1981
Calcite | Level 5
Thank you for checking mkeintz.

Naming the parsed value columns is not critical at the first pass. Named the METRIC column, so as to show it as a field to be parsed (1st position within the JSON).
Same goes for the DATE field stored as a string. Identifying that the 2nd position within JSON is a date stored as string, I can then apply the relevant format.
mkeintz
PROC Star

Apparently there is now a JSON engine for the libname statement.  See: Reading JSON

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
ChrisNZ
Tourmaline | Level 20

Like this?


data HAVE;
  COMPANY_ID= '6066 ';                                    
  DATA= "{'gw': {'date': '2015-12-31', 'currency': 'USD',"; output;
  DATA= "'value': 32021}, 'do': {'date': '2015-12-31',   "; output;
  DATA= " 'currency': 'USD', 'value': 71}}               "; output;
  COMPANY_ID= '6067 ';                                    
  DATA= "{'gw': {'date': '2015-12-31', 'currency': 'NZD',"; output;
  DATA= "'value': 32022}, 'do': {'date': '2015-12-31',   "; output;
  DATA= " 'currency': 'NZD', 'value': 72}}               "; output;
run;

data WANT;
  retain COMPANY_ID METRIC DATE CURRENCY VALUE PRX;
  length METRIC DATE CURRENCY VALUE WORD1 WORD2 $16;
  keep COMPANY_ID METRIC DATE CURRENCY VALUE;
  if _N_=1 then PRX=prxparse("/'.*?':( '.*?')?/");
  STARTPOS=1;
  set HAVE;
  do until (POS=0);
    call prxnext(PRX, STARTPOS, -1, DATA, POS, LEN); 
    if POS then do;
      WORD1= scan(substr(DATA,POS),1,"{}': ") ;
      WORD2= scan(substr(DATA,POS),2,"{}': ") ;
      select(WORD1);
        when( 'date'     ) DATE    =WORD2;
        when( 'currency' ) CURRENCY=WORD2;
        when( 'value'    ) VALUE   =WORD2;
        otherwise          METRIC  =WORD1;
      end;
    end;
    if VALUE ne '' then do;
      output;
      call missing( METRIC, DATE, CURRENCY, VALUE );
    end;
  end;
run;
proc print data=WANT noobs;
run; 
COMPANY_ID METRIC DATE CURRENCY VALUE
6066 gw 2015-12-31 USD 32021
6066 do 2015-12-31 USD 71
6067 gw 2015-12-31 NZD 32022
6067 do 2015-12-31 NZD 72

 

Ksharp
Super User

data x;
company_id=6066;
data="{'gw': {'date': '2015-12-31', 'currency': 'USD', 'value': 32021}, 'do': {'date': '2015-12-31', 'currency': 'USD', 'value': 71}}";
output;
run;
data temp;
 set x;
 do i=1 to countw(data,'{}[],:');
  temp=dequote(strip(scan(data,i,'{}[],:')));output;
 end;
drop data;
run;

data temp1;
 merge temp temp(firstobs=2 keep=temp company_id 
 rename=(temp=_temp company_id=_id));
 if (missing(_temp) and company_id=_id) or 
 temp in ('date' 'currency' 'value');
 drop _id;
run;
data temp2;
 set temp1;
 length metric $ 40;
 retain metric;
 if missing(_temp) then do;metric=temp;delete;end;
 drop i;
run;
proc transpose data=temp2 out=want;
by company_id metric notsorted;
var _temp;
id temp;
run;


ChrisNZ
Tourmaline | Level 20

@Ksharp If there is too much data for exporting to a flat file, I suspect creating 3 intermediate tables is not an option either.

Here is a way -similar to yours, where we create a suite of values in one column- to do it in one step (I don't see why the view should create a spill file, but I don't know. Is there a way to know?)

data HAVE;
  COMPANY_ID= '6066 ';                                    
  DATA= "{'gw': {'date': '2015-12-31', 'currency': 'USD',"; output;
  DATA= "'value': 32021}, 'do': {'date': '2015-12-31',   "; output;
  DATA= " 'currency': 'USD', 'value': 71}}               "; output;
  COMPANY_ID= '6067 ';                                    
  DATA= "{'gw': {'date': '2015-12-31', 'currency': 'NZD',"; output;
  DATA= "'value': 32022}, 'do': {'date': '2015-12-31',   "; output;
  DATA= " 'currency': 'NZD', 'value': 72}}               "; output;
run;

data _V/view=_V;
  set HAVE;
  length STR $16;
  keep COMPANY_ID STR;
  do I=1 to 1e9 until (STR=' ');
    STR=scan(DATA, I, "{}':, ") ;
    if STR ne ' ' then output;
  end;
run;

data WANT; 
  retain COMPANY_ID METRIC DATE CURRENCY ;
  keep   COMPANY_ID METRIC DATE CURRENCY VALUE;
  set _V;
  if ^SKIP_METRIC then METRIC  =STR;
  select(lag(STR));
    when( 'date'     ) DATE    =STR;
    when( 'currency' ) CURRENCY=STR;
    when( 'value'    ) VALUE   =STR;
    otherwise         ;
  end;                   
  if VALUE ne ' ' then do;
    output;
    SKIP_METRIC=0;
  end;
  else SKIP_METRIC+1;
run;

 

COMPANY_ID METRIC DATE CURRENCY VALUE
6066 gw 2015-12-31 USD 32021
6066 do 2015-12-31 USD 71
6067 gw 2015-12-31 NZD 32022
6067 do 2015-12-31 NZD 72

 

 

 

 

Ksharp
Super User

Chris,

I don't know either.

OP said the whole json string is in a variable named DATA, not like your data step did .

ChrisNZ
Tourmaline | Level 20

@Ksharp I understood differently: " unsure how to deal with multiple metric data rollover to separate rows of data."

The code works either way. 🙂

vinaybatra
Calcite | Level 5

Hi Sarav,

 

I have a similar requirement where I need to extract json data (hundreds of columns; big string) from a column in a table. Json is stored in a column in a table. Can you please share how you achieved it?

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 4769 views
  • 0 likes
  • 5 in conversation