Hi All,
I need to work with JSON data that is stored as a JSONB data type from the source POSTGRES table.
Below snapshot of the dataset details the JSON data under the 'data' column.
company_id data 6066 {'gw': {'date': '2015-12-31', 'currency': 'USD', 'value': 32021}, 'do': {'date': '2015-12-31', 'currency': 'USD', 'value': 71}}
Desired output is to transform the dataset to the below form.
company_id metric date currency value 6066 gw 2015-12-31 USD 32021 6066 do 2015-12-31 USD 71
The contents under the JSON string are pretty long extending to 1000's of metrics, and the dataset is large that it makes converting back to JSON and parsing it is cumbersome.
a. With the JSON data that needs parsing in a dataset, is there an option of still using INFILE and/or LIBNAME JSON feasible?
b. Tried parsing the string with CALL SCAN position parameters, but, unsure how to deal with multiple metric data rollover to separate rows of data.
Any pointers on the approach to follow would be very helpful. Thank you!
In your example, how would the JSON parser know the second variable is named METRIC?
Also, are you saying that your parser needs to look at quoted content to see whether it is a date value?
Apparently there is now a JSON engine for the libname statement. See: Reading JSON
Like this?
data HAVE;
COMPANY_ID= '6066 ';
DATA= "{'gw': {'date': '2015-12-31', 'currency': 'USD',"; output;
DATA= "'value': 32021}, 'do': {'date': '2015-12-31', "; output;
DATA= " 'currency': 'USD', 'value': 71}} "; output;
COMPANY_ID= '6067 ';
DATA= "{'gw': {'date': '2015-12-31', 'currency': 'NZD',"; output;
DATA= "'value': 32022}, 'do': {'date': '2015-12-31', "; output;
DATA= " 'currency': 'NZD', 'value': 72}} "; output;
run;
data WANT;
retain COMPANY_ID METRIC DATE CURRENCY VALUE PRX;
length METRIC DATE CURRENCY VALUE WORD1 WORD2 $16;
keep COMPANY_ID METRIC DATE CURRENCY VALUE;
if _N_=1 then PRX=prxparse("/'.*?':( '.*?')?/");
STARTPOS=1;
set HAVE;
do until (POS=0);
call prxnext(PRX, STARTPOS, -1, DATA, POS, LEN);
if POS then do;
WORD1= scan(substr(DATA,POS),1,"{}': ") ;
WORD2= scan(substr(DATA,POS),2,"{}': ") ;
select(WORD1);
when( 'date' ) DATE =WORD2;
when( 'currency' ) CURRENCY=WORD2;
when( 'value' ) VALUE =WORD2;
otherwise METRIC =WORD1;
end;
end;
if VALUE ne '' then do;
output;
call missing( METRIC, DATE, CURRENCY, VALUE );
end;
end;
run;
proc print data=WANT noobs;
run;
COMPANY_ID | METRIC | DATE | CURRENCY | VALUE |
---|---|---|---|---|
6066 | gw | 2015-12-31 | USD | 32021 |
6066 | do | 2015-12-31 | USD | 71 |
6067 | gw | 2015-12-31 | NZD | 32022 |
6067 | do | 2015-12-31 | NZD | 72 |
data x; company_id=6066; data="{'gw': {'date': '2015-12-31', 'currency': 'USD', 'value': 32021}, 'do': {'date': '2015-12-31', 'currency': 'USD', 'value': 71}}"; output; run; data temp; set x; do i=1 to countw(data,'{}[],:'); temp=dequote(strip(scan(data,i,'{}[],:')));output; end; drop data; run; data temp1; merge temp temp(firstobs=2 keep=temp company_id rename=(temp=_temp company_id=_id)); if (missing(_temp) and company_id=_id) or temp in ('date' 'currency' 'value'); drop _id; run; data temp2; set temp1; length metric $ 40; retain metric; if missing(_temp) then do;metric=temp;delete;end; drop i; run; proc transpose data=temp2 out=want; by company_id metric notsorted; var _temp; id temp; run;
@Ksharp If there is too much data for exporting to a flat file, I suspect creating 3 intermediate tables is not an option either.
Here is a way -similar to yours, where we create a suite of values in one column- to do it in one step (I don't see why the view should create a spill file, but I don't know. Is there a way to know?)
data HAVE;
COMPANY_ID= '6066 ';
DATA= "{'gw': {'date': '2015-12-31', 'currency': 'USD',"; output;
DATA= "'value': 32021}, 'do': {'date': '2015-12-31', "; output;
DATA= " 'currency': 'USD', 'value': 71}} "; output;
COMPANY_ID= '6067 ';
DATA= "{'gw': {'date': '2015-12-31', 'currency': 'NZD',"; output;
DATA= "'value': 32022}, 'do': {'date': '2015-12-31', "; output;
DATA= " 'currency': 'NZD', 'value': 72}} "; output;
run;
data _V/view=_V;
set HAVE;
length STR $16;
keep COMPANY_ID STR;
do I=1 to 1e9 until (STR=' ');
STR=scan(DATA, I, "{}':, ") ;
if STR ne ' ' then output;
end;
run;
data WANT;
retain COMPANY_ID METRIC DATE CURRENCY ;
keep COMPANY_ID METRIC DATE CURRENCY VALUE;
set _V;
if ^SKIP_METRIC then METRIC =STR;
select(lag(STR));
when( 'date' ) DATE =STR;
when( 'currency' ) CURRENCY=STR;
when( 'value' ) VALUE =STR;
otherwise ;
end;
if VALUE ne ' ' then do;
output;
SKIP_METRIC=0;
end;
else SKIP_METRIC+1;
run;
COMPANY_ID | METRIC | DATE | CURRENCY | VALUE |
---|---|---|---|---|
6066 | gw | 2015-12-31 | USD | 32021 |
6066 | do | 2015-12-31 | USD | 71 |
6067 | gw | 2015-12-31 | NZD | 32022 |
6067 | do | 2015-12-31 | NZD | 72 |
Chris,
I don't know either.
OP said the whole json string is in a variable named DATA, not like your data step did .
@Ksharp I understood differently: " unsure how to deal with multiple metric data rollover to separate rows of data."
The code works either way. 🙂
Hi Sarav,
I have a similar requirement where I need to extract json data (hundreds of columns; big string) from a column in a table. Json is stored in a column in a table. Can you please share how you achieved it?
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.