- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi All,
I need to work with JSON data that is stored as a JSONB data type from the source POSTGRES table.
Below snapshot of the dataset details the JSON data under the 'data' column.
company_id data 6066 {'gw': {'date': '2015-12-31', 'currency': 'USD', 'value': 32021}, 'do': {'date': '2015-12-31', 'currency': 'USD', 'value': 71}}
Desired output is to transform the dataset to the below form.
company_id metric date currency value 6066 gw 2015-12-31 USD 32021 6066 do 2015-12-31 USD 71
The contents under the JSON string are pretty long extending to 1000's of metrics, and the dataset is large that it makes converting back to JSON and parsing it is cumbersome.
a. With the JSON data that needs parsing in a dataset, is there an option of still using INFILE and/or LIBNAME JSON feasible?
b. Tried parsing the string with CALL SCAN position parameters, but, unsure how to deal with multiple metric data rollover to separate rows of data.
Any pointers on the approach to follow would be very helpful. Thank you!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
In your example, how would the JSON parser know the second variable is named METRIC?
Also, are you saying that your parser needs to look at quoted content to see whether it is a date value?
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Naming the parsed value columns is not critical at the first pass. Named the METRIC column, so as to show it as a field to be parsed (1st position within the JSON).
Same goes for the DATE field stored as a string. Identifying that the 2nd position within JSON is a date stored as string, I can then apply the relevant format.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Apparently there is now a JSON engine for the libname statement. See: Reading JSON
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Like this?
data HAVE;
COMPANY_ID= '6066 ';
DATA= "{'gw': {'date': '2015-12-31', 'currency': 'USD',"; output;
DATA= "'value': 32021}, 'do': {'date': '2015-12-31', "; output;
DATA= " 'currency': 'USD', 'value': 71}} "; output;
COMPANY_ID= '6067 ';
DATA= "{'gw': {'date': '2015-12-31', 'currency': 'NZD',"; output;
DATA= "'value': 32022}, 'do': {'date': '2015-12-31', "; output;
DATA= " 'currency': 'NZD', 'value': 72}} "; output;
run;
data WANT;
retain COMPANY_ID METRIC DATE CURRENCY VALUE PRX;
length METRIC DATE CURRENCY VALUE WORD1 WORD2 $16;
keep COMPANY_ID METRIC DATE CURRENCY VALUE;
if _N_=1 then PRX=prxparse("/'.*?':( '.*?')?/");
STARTPOS=1;
set HAVE;
do until (POS=0);
call prxnext(PRX, STARTPOS, -1, DATA, POS, LEN);
if POS then do;
WORD1= scan(substr(DATA,POS),1,"{}': ") ;
WORD2= scan(substr(DATA,POS),2,"{}': ") ;
select(WORD1);
when( 'date' ) DATE =WORD2;
when( 'currency' ) CURRENCY=WORD2;
when( 'value' ) VALUE =WORD2;
otherwise METRIC =WORD1;
end;
end;
if VALUE ne '' then do;
output;
call missing( METRIC, DATE, CURRENCY, VALUE );
end;
end;
run;
proc print data=WANT noobs;
run;
COMPANY_ID | METRIC | DATE | CURRENCY | VALUE |
---|---|---|---|---|
6066 | gw | 2015-12-31 | USD | 32021 |
6066 | do | 2015-12-31 | USD | 71 |
6067 | gw | 2015-12-31 | NZD | 32022 |
6067 | do | 2015-12-31 | NZD | 72 |
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data x; company_id=6066; data="{'gw': {'date': '2015-12-31', 'currency': 'USD', 'value': 32021}, 'do': {'date': '2015-12-31', 'currency': 'USD', 'value': 71}}"; output; run; data temp; set x; do i=1 to countw(data,'{}[],:'); temp=dequote(strip(scan(data,i,'{}[],:')));output; end; drop data; run; data temp1; merge temp temp(firstobs=2 keep=temp company_id rename=(temp=_temp company_id=_id)); if (missing(_temp) and company_id=_id) or temp in ('date' 'currency' 'value'); drop _id; run; data temp2; set temp1; length metric $ 40; retain metric; if missing(_temp) then do;metric=temp;delete;end; drop i; run; proc transpose data=temp2 out=want; by company_id metric notsorted; var _temp; id temp; run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Ksharp If there is too much data for exporting to a flat file, I suspect creating 3 intermediate tables is not an option either.
Here is a way -similar to yours, where we create a suite of values in one column- to do it in one step (I don't see why the view should create a spill file, but I don't know. Is there a way to know?)
data HAVE;
COMPANY_ID= '6066 ';
DATA= "{'gw': {'date': '2015-12-31', 'currency': 'USD',"; output;
DATA= "'value': 32021}, 'do': {'date': '2015-12-31', "; output;
DATA= " 'currency': 'USD', 'value': 71}} "; output;
COMPANY_ID= '6067 ';
DATA= "{'gw': {'date': '2015-12-31', 'currency': 'NZD',"; output;
DATA= "'value': 32022}, 'do': {'date': '2015-12-31', "; output;
DATA= " 'currency': 'NZD', 'value': 72}} "; output;
run;
data _V/view=_V;
set HAVE;
length STR $16;
keep COMPANY_ID STR;
do I=1 to 1e9 until (STR=' ');
STR=scan(DATA, I, "{}':, ") ;
if STR ne ' ' then output;
end;
run;
data WANT;
retain COMPANY_ID METRIC DATE CURRENCY ;
keep COMPANY_ID METRIC DATE CURRENCY VALUE;
set _V;
if ^SKIP_METRIC then METRIC =STR;
select(lag(STR));
when( 'date' ) DATE =STR;
when( 'currency' ) CURRENCY=STR;
when( 'value' ) VALUE =STR;
otherwise ;
end;
if VALUE ne ' ' then do;
output;
SKIP_METRIC=0;
end;
else SKIP_METRIC+1;
run;
COMPANY_ID | METRIC | DATE | CURRENCY | VALUE |
---|---|---|---|---|
6066 | gw | 2015-12-31 | USD | 32021 |
6066 | do | 2015-12-31 | USD | 71 |
6067 | gw | 2015-12-31 | NZD | 32022 |
6067 | do | 2015-12-31 | NZD | 72 |
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Chris,
I don't know either.
OP said the whole json string is in a variable named DATA, not like your data step did .
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Ksharp I understood differently: " unsure how to deal with multiple metric data rollover to separate rows of data."
The code works either way. 🙂
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi Sarav,
I have a similar requirement where I need to extract json data (hundreds of columns; big string) from a column in a table. Json is stored in a column in a table. Can you please share how you achieved it?