Hi i have a dataset of lets say 10 records and one column with a value like this below {"userCostCenter":1020,"countryCode":"CA","userMutualFundUpdateAccess":"true",
"keepAliveURL":"https://c3w78.sys.c3.tdgroup.com/c3/KeepAliveServlet",
"returnURL":"https://c3w78.sys.c3.tdgroup.com/c3/DiscoveryToolTransferInServlet",
"languageCode":"en","sessionLength":10800,"userLogonID":"TESTC35"}
and i need to parse this so that the new dataset would have
userCostCenter,countryCode, UserMutualFundUpdateAccess , keepAliveURL, returnURL, languageCode, sessionLength, and userLogonID as columns and '1020','CA','true'.... as their respective values . How do i do that?
In fact i created this dataset by importing a JSON file into SAS. Is there a way i can parse this even before importing the file into SAS?
I'd appreciate any suggestions . Thanks
Something as simple as the example you show that is just simple name/value pairs separated by commas. You can just use SCAN() function.
data test;
string='
{"userCostCenter":1020
,"countryCode":"CA"
,"userMutualFundUpdateAccess":"true"
,"keepAliveURL":"https://c3w78.sys.c3.tdgroup.com/c3/KeepAliveServlet"
,"returnURL":"https://c3w78.sys.c3.tdgroup.com/c3/DiscoveryToolTransferInServlet"
,"languageCode":"en"
,"sessionLength":10800
,"userLogonID":"TESTC35"
}';
length pair $300 name $50 value $300 ;
do index=1 to countw(string,'{,}','q');
pair = scan(string,index,'{,}','q');
name = dequote(scan(pair,1,':','q'));
value = dequote(scan(pair,2,':','q'));
select (name);
when ('userCostCenter') userCostCenter=input(value,32.);
when ('countryCode') countryCode=value;
when ('userMutualFundUpdateAccess') userMutualFundUpdateAccess=value;
when ('keepAliveURL') keepAliveURL=value;
when ('returnURL') returnURL=value;
when ('languageCode') languageCode=value;
when ('sessionLength') sessionLength=input(value,32.);
when ('userLogonID') userLogonID=value;
other put 'NOTE: Unknown field. ' pair=;
end;
end;
drop index pair name value ;
run;
If it is complicated JSON text then convert it to a real JSON text file and use the JSON libref engine to read it. Make sure to include whatever key variables you need to merge the new dataset back to the main dataset.
969 data _null_; 970 set; 971 put (_all_) (=/); 972 run; string={"userCostCenter":1020,"countryCode":"CA","userMutualFundUpdateAccess" :"true","keepAliveURL":"https://c3w78.sys.c3.tdgroup.com/c3/KeepAliveServlet" ,"returnURL":"https://c3w78.sys.c3.tdgroup.com/c3/DiscoveryToolTransferInServ let","languageCode":"en","sessionLength":10800,"userLogonID":"TESTC35"} userCostCenter=1020 countryCode=CA userMutualFundUpdateAccess=true keepAliveURL=https://c3w78.sys.c3.tdgroup.com/c3/KeepAliveServlet returnURL=https://c3w78.sys.c3.tdgroup.com/c3/DiscoveryToolTransferInServlet languageCode=en sessionLength=10800 userLogonID=TESTC35 NOTE: There were 1 observations read from the data set WORK.TEST.
> In fact i created this dataset by importing a JSON file into SAS. Is there a way i can parse this even before importing the file into SAS?
I am unsure what this means
How did you import your JSON file, as a text file or using the JSON libname?
https://blogs.sas.com/content/sasdummy/2016/12/02/json-libname-engine-sas/
@Tal wrote:
Hi i have a dataset of lets say 10 records and one column with a value like this below {"userCostCenter":1020,"countryCode":"CA","userMutualFundUpdateAccess":"true",
"keepAliveURL":"https://c3w78.sys.c3.tdgroup.com/c3/KeepAliveServlet",
"returnURL":"https://c3w78.sys.c3.tdgroup.com/c3/DiscoveryToolTransferInServlet",
"languageCode":"en","sessionLength":10800,"userLogonID":"TESTC35"}
and i need to parse this so that the new dataset would have
userCostCenter,countryCode, UserMutualFundUpdateAccess , keepAliveURL, returnURL, languageCode, sessionLength, and userLogonID as columns and '1020','CA','true'.... as their respective values . How do i do that?
In fact i created this dataset by importing a JSON file into SAS. Is there a way i can parse this even before importing the file into SAS?
I'd appreciate any suggestions . Thanks
as a text file, using the "infile". It has several columns and one is in JSON format (the one i shared )
Thanks Reeza but looks like the file needs to be a json file to be able to use the json libname engine .
Mine is text with a few json formatted fields . Tried to convert the file online but getting error
Something as simple as the example you show that is just simple name/value pairs separated by commas. You can just use SCAN() function.
data test;
string='
{"userCostCenter":1020
,"countryCode":"CA"
,"userMutualFundUpdateAccess":"true"
,"keepAliveURL":"https://c3w78.sys.c3.tdgroup.com/c3/KeepAliveServlet"
,"returnURL":"https://c3w78.sys.c3.tdgroup.com/c3/DiscoveryToolTransferInServlet"
,"languageCode":"en"
,"sessionLength":10800
,"userLogonID":"TESTC35"
}';
length pair $300 name $50 value $300 ;
do index=1 to countw(string,'{,}','q');
pair = scan(string,index,'{,}','q');
name = dequote(scan(pair,1,':','q'));
value = dequote(scan(pair,2,':','q'));
select (name);
when ('userCostCenter') userCostCenter=input(value,32.);
when ('countryCode') countryCode=value;
when ('userMutualFundUpdateAccess') userMutualFundUpdateAccess=value;
when ('keepAliveURL') keepAliveURL=value;
when ('returnURL') returnURL=value;
when ('languageCode') languageCode=value;
when ('sessionLength') sessionLength=input(value,32.);
when ('userLogonID') userLogonID=value;
other put 'NOTE: Unknown field. ' pair=;
end;
end;
drop index pair name value ;
run;
If it is complicated JSON text then convert it to a real JSON text file and use the JSON libref engine to read it. Make sure to include whatever key variables you need to merge the new dataset back to the main dataset.
969 data _null_; 970 set; 971 put (_all_) (=/); 972 run; string={"userCostCenter":1020,"countryCode":"CA","userMutualFundUpdateAccess" :"true","keepAliveURL":"https://c3w78.sys.c3.tdgroup.com/c3/KeepAliveServlet" ,"returnURL":"https://c3w78.sys.c3.tdgroup.com/c3/DiscoveryToolTransferInServ let","languageCode":"en","sessionLength":10800,"userLogonID":"TESTC35"} userCostCenter=1020 countryCode=CA userMutualFundUpdateAccess=true keepAliveURL=https://c3w78.sys.c3.tdgroup.com/c3/KeepAliveServlet returnURL=https://c3w78.sys.c3.tdgroup.com/c3/DiscoveryToolTransferInServlet languageCode=en sessionLength=10800 userLogonID=TESTC35 NOTE: There were 1 observations read from the data set WORK.TEST.
Thanks Tom,
it turns out that the json formatted field can have different set of variables within it so i slightly modiifed your query:
data want;
/*string='{"c3data":{"userCostCenter":1020,
"countryCode":"CA",
"userMutualFundUpdateAccess":"true",
"keepAliveURL":"https://c3w78.sys.c3.tdgroup.com/c3/KeepAliveServlet",
"returnURL":"https://c3w78.sys.c3.tdgroup.com/c3/DiscoveryToolTransferInServlet",
"languageCode":"en",
"sessionLength":10800,
"userLogonID":"TESTC35"}}';;*/
/*string='{"CD_Q4":["C"],*/
/* "isExistingCustomer":"true",*/
/* "CD_Q2":["C"],*/
/* "isProfessionalStudent":"false",*/
/* "isN2C":"true",*/
/* "PSI_Q14":3,*/
/* "PSI_Q1A":"C",*/
/* "PSI_Q1B":"C",*/
/* "PSI_Q7":100000,*/
/* "PSI_Q8":10000,*/
/* "PSI_Q9B":100,*/
/* "PSI_Q2":"B",*/
/* "PSI_Q3":"C",*/
/* "PSI_Q5":"B",*/
/* "PSI_Q6":"C",*/
/* "PSI_Q12":"C",*/
/* "PSI_Q4":"D",*/
/* "CD_Q6":["NONE"],*/
/* "secondaryNeeds":["N2C-Package","PSI-Products"],*/
/* "wealthConfirmationCheckbox":"true",*/
/* "AdditionalQ1":"C",*/
/* "AdditionalQ1_Text":"test",*/
/* "showInterestInd":"false"}';*/
string='{ "TotalBorrowers":"1",
"CPAT_Q2":[{"borrower":"primaryBorrower",
"secondaryNeeds":["N2C-Package","PSI-Products"],
"age":67,
"CD_Q2":["C"],
"province":"ON"}]}';
do index=1 to countw(string,'{,}','q');
pair=scan(string,index,'{,}','q');
name=dequote(scan(pair,1,':','q'));
value=scan(pair,2,':','q');
output;
end;
drop index string pair;
run;
and am getting all variables parsed but names of variables in one column and their values in the other which means i will have to transpose this to get the dataset i want. Transposing not an issue but along with the json formatted fields the text file also has other fields which will make transposing difficult, So any chance :
1.
name value
var1 x
var2 y
var3 z
can be switched to
var1 var2 var3
x y z
2. to suppress /remove variables such as CPAT*,CD*,PSI* from the parsed field? I dont need those
3. I am seeing "[","]" as names /values in "name" and "value ". Can that be removed?
4.["N2C-Package","PSI-Products"], should be a value but i see PSI-Products under "names" and i am sure this got something to do with the comma but no idea how to fix it
actually they just told me , they will only want two same variables all the time from the jason field so i will use your code, It works . Thank you all for you inputs . Much appreciated
@Tal,
The below code will parse the JSON that you included in your original post. This code has the advantage that one does not need to know the variable names beforehand. This is somewhat unsophisticated code in that I'm not checking for embedded commas in either the variable names or the values. If there are embedded commas, this code would have to be modified.
FILENAME In_Data 'C:\Users\jbarbou3\Documents\SAS\Pgm\Training\JSON_Parse\JSON_Data.txt';
DATA WORK.Parsed_Data;
DROP _:;
LENGTH _JSON_Data $32767;
LENGTH _Txt_Wrk1 $1026;
LENGTH Var $32;
LENGTH Value $1026;
INFILE In_Data LENGTH = _JSON_Length
RECFM = V
;
INPUT @1 _JSON_Data $VARYING32767. _JSON_Length;
** The first character is a {, so start in position 2. **;
_Position = 2;
SUBSTR(_JSON_Data, _JSON_Length, 1) = ',';
DO WHILE (_Position < _JSON_Length);
_Segment_Length = INDEXC(SUBSTR(_JSON_Data, _Position), ',');
_Txt_Wrk1 = SUBSTR(_JSON_Data, _Position, (_Segment_Length - 1));
Var = COMPRESS(SUBSTR(_Txt_Wrk1, 1, (INDEXC(_Txt_Wrk1, ':') - 1)), '"');
Value = COMPRESS(SUBSTR(_Txt_Wrk1, (INDEXC(_Txt_Wrk1, ':') + 1)), '"');
_Position = _Position + _Segment_Length;
OUTPUT;
END;
RUN;
The above code yields the following results:
Jim
Well, I was thinking about this just a bit more. If one is content with collecting a series of variable names and their values by parsing the JSON data, then my previous reply is sufficient.
However, if one were to desire to have the variable names become separate variable names in a SAS data set, that's a bit more involved. There are no doubt other ways to do this, but since I rather like macros, I coded the following:
FILENAME In_Data 'C:\Users\jbarbou3\Documents\SAS\Pgm\Training\JSON_Parse\JSON_Data.txt';
**------------------------------------------------------------------------------**;
DATA WORK.Parsed_Data;
DROP _:;
LENGTH _JSON_Data $32767;
LENGTH _Text_Work $1026;
LENGTH _Var_Name $32;
LENGTH Value $1026;
INFILE In_Data LENGTH = _JSON_Length
RECFM = V
;
INPUT @1 _JSON_Data $VARYING32767. _JSON_Length;
** The first character is a {, so start in position 2. **;
_Position = 2;
_Iteration = 0;
SUBSTR(_JSON_Data, _JSON_Length, 1) = ',';
DO WHILE (_Position < _JSON_Length);
_Iteration + 1;
_Segment_Length = INDEXC(SUBSTR(_JSON_Data, _Position), ',');
_Text_Work = SUBSTR(_JSON_Data, _Position, (_Segment_Length - 1));
_Var_Name = COMPRESS(SUBSTR(_Text_Work, 1, (INDEXC(_Text_Work, ':') - 1)), '"');
Value = COMPRESS(SUBSTR(_Text_Work, (INDEXC(_Text_Work, ':') + 1)), '"');
_Position = _Position + _Segment_Length;
IF _N_ = 1 THEN
CALL SYMPUTX(CATS('Var_Name', PUT(_Iteration, F3.)), _Var_Name);
OUTPUT;
END;
IF _N_ = 1 THEN
CALL SYMPUTX('Var_Count', PUT(_Iteration, F3.));
RUN;
**------------------------------------------------------------------------------**;
%MACRO List_Vars(Comma=NO);
%IF %QUPCASE(&Comma) = YES %THEN
%DO;
%LET Comma = ,;
%END;
%ELSE
%DO;
%LET Comma = ;
%END;
%LOCAL i;
%DO i = 1 %TO &Var_Count;
%IF &i = &Var_Count %THEN
%LET Comma = ;
&&Var_Name&i &Comma
%END;
%MEND List_Vars;
**------------------------------------------------------------------------------**;
%MACRO Equate_Vars;
%LOCAL i;
%DO i = 1 %TO &Var_Count;
IF _Iteration = &i THEN
&&Var_Name&i = Value;
%END;
%MEND Equate_Vars;
**------------------------------------------------------------------------------**;
DATA WORK.Final_Data;
DROP Value;
DROP _:;
RETAIN _Iteration;
RETAIN %List_Vars;
SET WORK.PARSED_DATA;
_Iteration + 1;
IF _Iteration > &Var_Count THEN
_Iteration = 1;
%Equate_Vars;
IF _Iteration = &Var_Count THEN
DO;
OUTPUT;
CALL MISSING(%List_Vars(Comma=YES));
END;
RUN;
**------------------------------------------------------------------------------**;
The result of which is a SAS data set with column names parsed out from the JSON data:
This of course assumes that every iteration of the JSON data has the same set of variables. If all the variable names are not the same, then I think the JSON files would have to be parsed into separate SAS data sets which doesn't sound very practical.
Jim
Thanks jimbarbour,
yea looks like the json field is not consistent in all the records (different variables within)
OK. Well, am I on the right track here? The code above could be used to take one column out of some data, a column whose contents are JSON formatted, and create SAS variables out of them.
If there are different variable names in another row in that same JSON formatted column, the code could be made to work, but you'd have to have some columns set to missing if not every column is coded in every row of the JSON formatted data.
Jim
@Tal,
It was an interesting exercise, so I went ahead and wrote code that can create SAS variables based on JSON formatted data even if not all of the same variables are in each instance of JSON formatted data. If you ever need it, here it is. 🙂
As test data, I added a few more rows. Some of the additional rows have columns that previous rows do not. The added columns are:
Here are the results. Note that for rows in which the JSON data did not contain a value for a given column that the value of that column is "missing".
Here's the code. I hope it comes in handy some time:
%Time_Stamp(START);
**------------------------------------------------------------------------------**;
OPTIONS VALIDVARNAME = V7;
%LET Cmnt = ;
%LET Width = 75;
**------------------------------------------------------------------------------**;
FILENAME In_Data 'C:\Users\jbarbou3\Documents\SAS\Pgm\Training\JSON_Parse\JSON_Data.txt';
**------------------------------------------------------------------------------**;
DATA WORK.Parsed_Data;
DROP _:;
LENGTH _JSON_Data $32767;
LENGTH _Segment $1026;
LENGTH _Var_Name $32;
LENGTH Row_Nbr 4;
LENGTH Var_Nbr 4;
LENGTH Value $1026;
RETAIN _Max_Nbr 0;
IF _N_ = 1 THEN
DO;
CALL MISSING(Var_Nbr, _Var_Name);
DECLARE HASH Vars(ORDERED: 'A');
Vars.DEFINEKEY('_Var_Name');
Vars.DEFINEDATA('Var_Nbr');
Vars.DEFINEDONE();
END;
IF _End_of_Data THEN
DO;
CALL SYMPUTX('Var_Count', PUT(_Max_Nbr, F3.), 'G');
END;
INFILE In_Data LENGTH = _JSON_Length
RECFM = V
END = _End_of_Data
;
INPUT @1 _JSON_Data $VARYING32767. _JSON_Length;
** The first character is a {, so start in position 2. **;
_Position = 2;
SUBSTR(_JSON_Data, _JSON_Length, 1) = ',';
DO WHILE (_Position < _JSON_Length);
_Segment_Length = INDEXC(SUBSTR(_JSON_Data, _Position), ',');
_Segment = SUBSTR(_JSON_Data, _Position, (_Segment_Length - 1));
_Var_Name = COMPRESS(SUBSTR(_Segment, 1, (INDEXC(_Segment, ':') - 1)), '"');
Value = COMPRESS(SUBSTR(_Segment, (INDEXC(_Segment, ':') + 1)), '"');
_Position = _Position + _Segment_Length;
_RC = Vars.FIND();
IF _RC = 0 THEN
DO;
CALL SYMPUTX(CATS('Var_Name', PUT(Var_Nbr, F3.)), _Var_Name, 'G');
IF Var_Nbr > _Max_Nbr THEN
_Max_Nbr = Var_Nbr;
END;
ELSE
DO;
_Max_Nbr + 1;
CALL SYMPUTX(CATS('Var_Name', PUT(_Max_Nbr, F3.)), _Var_Name, 'G');
Var_Nbr = _Max_Nbr;
_RC = Vars.ADD();
IF _RC > 0 THEN
DO;
PUTLOG "&Err2 ";
PUTLOG "&Err2 %Format_Dashes(&Width)";
PUTLOG "&Err1 | Unable to add variable to Hash table. " _RC=;
PUTLOG "&Err2 %Format_Dashes(&Width)";
END;
END;
Row_Nbr = _N_;
OUTPUT;
END;
RUN;
**------------------------------------------------------------------------------**;
&Cmnt.PROC PRINT DATA=WORK.PARSED_DATA;
&Cmnt.RUN;
**------------------------------------------------------------------------------**;
%MACRO List_Vars(Comma=NO);
%IF %QUPCASE(&Comma) = YES %THEN
%DO;
%LET Comma = ,;
%END;
%ELSE
%DO;
%LET Comma = ;
%END;
%LOCAL i;
%DO i = 1 %TO &Var_Count;
%IF &i = &Var_Count %THEN
%LET Comma = ;
&&Var_Name&i &Comma
%END;
%MEND List_Vars;
&Cmnt%PUT &Nte1 &=Var_Count %List_Vars(Comma=YES);
**------------------------------------------------------------------------------**;
%MACRO Equate_Vars;
%LOCAL i;
%DO i = 1 %TO &Var_Count;
IF Var_Nbr = &i THEN
&&Var_Name&i = Value;
%END;
%MEND Equate_Vars;
&Cmnt%PUT &Nte1 %QUOTE(%Equate_Vars);
**------------------------------------------------------------------------------**;
DATA WORK.Final_Data;
DROP Row_Nbr;
DROP Var_Nbr;
DROP Value;
DROP _:;
RETAIN %List_Vars;
SET WORK.PARSED_DATA;
BY Row_Nbr;
%Equate_Vars;
IF LAST.Row_Nbr THEN
DO;
OUTPUT;
CALL MISSING(%List_Vars(Comma=YES));
END;
RUN;
**------------------------------------------------------------------------------**;
&Cmnt.PROC PRINT DATA=WORK.FINAL_DATA;
&Cmnt.RUN;
**------------------------------------------------------------------------------**;
%Time_Stamp(STOP);
Jim
Hi @jimbarbour .
This solution looks exactly what i need for my project. However I have the json string in a column in a SAS table.
I'm very novice SAS Programmer.
Is there an easy way to modify the code. Table x has 4 columns where column 3 (JData) is the JSON string.
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.