BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Tal
Pyrite | Level 9 Tal
Pyrite | Level 9

Hi i have a  dataset  of lets say 10 records  and one column with a value like this below {"userCostCenter":1020,"countryCode":"CA","userMutualFundUpdateAccess":"true",

"keepAliveURL":"https://c3w78.sys.c3.tdgroup.com/c3/KeepAliveServlet",

"returnURL":"https://c3w78.sys.c3.tdgroup.com/c3/DiscoveryToolTransferInServlet",

"languageCode":"en","sessionLength":10800,"userLogonID":"TESTC35"}

 

and  i need to parse  this so  that the new  dataset would  have 

userCostCenter,countryCode, UserMutualFundUpdateAccess , keepAliveURL,  returnURL, languageCode, sessionLength, and userLogonID as  columns  and '1020','CA','true'.... as their respective values . How do i  do that?  

In fact  i created  this dataset  by importing a  JSON file into SAS. Is there a  way i can parse this even before importing the  file into SAS?

 

I'd appreciate any suggestions . Thanks 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Something as simple as the example you show that is just simple name/value pairs separated by commas.  You can just use SCAN() function.

data test;
  string='
{"userCostCenter":1020
,"countryCode":"CA"
,"userMutualFundUpdateAccess":"true"
,"keepAliveURL":"https://c3w78.sys.c3.tdgroup.com/c3/KeepAliveServlet"
,"returnURL":"https://c3w78.sys.c3.tdgroup.com/c3/DiscoveryToolTransferInServlet"
,"languageCode":"en"
,"sessionLength":10800
,"userLogonID":"TESTC35"
}';
length pair $300 name $50 value $300 ;
do index=1 to countw(string,'{,}','q');
   pair = scan(string,index,'{,}','q');
   name = dequote(scan(pair,1,':','q'));
   value = dequote(scan(pair,2,':','q'));
   select (name);
      when ('userCostCenter') userCostCenter=input(value,32.);
      when ('countryCode') countryCode=value;
      when ('userMutualFundUpdateAccess') userMutualFundUpdateAccess=value;
      when ('keepAliveURL') keepAliveURL=value;
      when ('returnURL') returnURL=value;
      when ('languageCode') languageCode=value;
      when ('sessionLength') sessionLength=input(value,32.);
      when ('userLogonID') userLogonID=value;
      other put 'NOTE: Unknown field. ' pair=;
  end;
end;
drop index pair name value ;
run;

If it is complicated JSON text then convert it to a real JSON text file and use the JSON libref engine to read it. Make sure to include whatever key variables you need to merge the new dataset back to the main dataset.

969   data _null_;
970     set;
971     put (_all_) (=/);
972   run;


string={"userCostCenter":1020,"countryCode":"CA","userMutualFundUpdateAccess"
:"true","keepAliveURL":"https://c3w78.sys.c3.tdgroup.com/c3/KeepAliveServlet"
,"returnURL":"https://c3w78.sys.c3.tdgroup.com/c3/DiscoveryToolTransferInServ
let","languageCode":"en","sessionLength":10800,"userLogonID":"TESTC35"}
userCostCenter=1020
countryCode=CA
userMutualFundUpdateAccess=true
keepAliveURL=https://c3w78.sys.c3.tdgroup.com/c3/KeepAliveServlet
returnURL=https://c3w78.sys.c3.tdgroup.com/c3/DiscoveryToolTransferInServlet
languageCode=en
sessionLength=10800
userLogonID=TESTC35
NOTE: There were 1 observations read from the data set WORK.TEST.

View solution in original post

16 REPLIES 16
ChrisNZ
Tourmaline | Level 20

> In fact  i created  this dataset  by importing a  JSON file into SAS. Is there a  way i can parse this even before importing the  file into SAS?

I am unsure what this means

Reeza
Super User

How did you import your JSON file, as a text file or using the JSON libname?

 

https://blogs.sas.com/content/sasdummy/2016/12/02/json-libname-engine-sas/

 


@Tal wrote:

Hi i have a  dataset  of lets say 10 records  and one column with a value like this below {"userCostCenter":1020,"countryCode":"CA","userMutualFundUpdateAccess":"true",

"keepAliveURL":"https://c3w78.sys.c3.tdgroup.com/c3/KeepAliveServlet",

"returnURL":"https://c3w78.sys.c3.tdgroup.com/c3/DiscoveryToolTransferInServlet",

"languageCode":"en","sessionLength":10800,"userLogonID":"TESTC35"}

 

and  i need to parse  this so  that the new  dataset would  have 

userCostCenter,countryCode, UserMutualFundUpdateAccess , keepAliveURL,  returnURL, languageCode, sessionLength, and userLogonID as  columns  and '1020','CA','true'.... as their respective values . How do i  do that?  

In fact  i created  this dataset  by importing a  JSON file into SAS. Is there a  way i can parse this even before importing the  file into SAS?

 

I'd appreciate any suggestions . Thanks 

 

 


 

Tal
Pyrite | Level 9 Tal
Pyrite | Level 9

as  a  text file, using the "infile". It has several columns and one  is in JSON  format (the one  i shared )

Reeza
Super User
Write that data back out to a text file and then use the JSON libname to try and read it in would be my suggestion.
Tal
Pyrite | Level 9 Tal
Pyrite | Level 9

Thanks Reeza but looks like  the file needs to be a  json file  to be able to use the json  libname engine .

Mine is text  with a  few json formatted fields . Tried  to convert the  file online but getting error

Tom
Super User Tom
Super User

Something as simple as the example you show that is just simple name/value pairs separated by commas.  You can just use SCAN() function.

data test;
  string='
{"userCostCenter":1020
,"countryCode":"CA"
,"userMutualFundUpdateAccess":"true"
,"keepAliveURL":"https://c3w78.sys.c3.tdgroup.com/c3/KeepAliveServlet"
,"returnURL":"https://c3w78.sys.c3.tdgroup.com/c3/DiscoveryToolTransferInServlet"
,"languageCode":"en"
,"sessionLength":10800
,"userLogonID":"TESTC35"
}';
length pair $300 name $50 value $300 ;
do index=1 to countw(string,'{,}','q');
   pair = scan(string,index,'{,}','q');
   name = dequote(scan(pair,1,':','q'));
   value = dequote(scan(pair,2,':','q'));
   select (name);
      when ('userCostCenter') userCostCenter=input(value,32.);
      when ('countryCode') countryCode=value;
      when ('userMutualFundUpdateAccess') userMutualFundUpdateAccess=value;
      when ('keepAliveURL') keepAliveURL=value;
      when ('returnURL') returnURL=value;
      when ('languageCode') languageCode=value;
      when ('sessionLength') sessionLength=input(value,32.);
      when ('userLogonID') userLogonID=value;
      other put 'NOTE: Unknown field. ' pair=;
  end;
end;
drop index pair name value ;
run;

If it is complicated JSON text then convert it to a real JSON text file and use the JSON libref engine to read it. Make sure to include whatever key variables you need to merge the new dataset back to the main dataset.

969   data _null_;
970     set;
971     put (_all_) (=/);
972   run;


string={"userCostCenter":1020,"countryCode":"CA","userMutualFundUpdateAccess"
:"true","keepAliveURL":"https://c3w78.sys.c3.tdgroup.com/c3/KeepAliveServlet"
,"returnURL":"https://c3w78.sys.c3.tdgroup.com/c3/DiscoveryToolTransferInServ
let","languageCode":"en","sessionLength":10800,"userLogonID":"TESTC35"}
userCostCenter=1020
countryCode=CA
userMutualFundUpdateAccess=true
keepAliveURL=https://c3w78.sys.c3.tdgroup.com/c3/KeepAliveServlet
returnURL=https://c3w78.sys.c3.tdgroup.com/c3/DiscoveryToolTransferInServlet
languageCode=en
sessionLength=10800
userLogonID=TESTC35
NOTE: There were 1 observations read from the data set WORK.TEST.
Tal
Pyrite | Level 9 Tal
Pyrite | Level 9

Thanks Tom,

it turns out that the json formatted field can have different set of  variables within it so i slightly modiifed your query:

data want;
/*string='{"c3data":{"userCostCenter":1020,
"countryCode":"CA",
"userMutualFundUpdateAccess":"true",
"keepAliveURL":"https://c3w78.sys.c3.tdgroup.com/c3/KeepAliveServlet",
"returnURL":"https://c3w78.sys.c3.tdgroup.com/c3/DiscoveryToolTransferInServlet",
"languageCode":"en",
"sessionLength":10800,
"userLogonID":"TESTC35"}}';;*/

/*string='{"CD_Q4":["C"],*/
/* "isExistingCustomer":"true",*/
/* "CD_Q2":["C"],*/
/* "isProfessionalStudent":"false",*/
/* "isN2C":"true",*/
/* "PSI_Q14":3,*/
/* "PSI_Q1A":"C",*/
/* "PSI_Q1B":"C",*/
/* "PSI_Q7":100000,*/
/* "PSI_Q8":10000,*/
/* "PSI_Q9B":100,*/
/* "PSI_Q2":"B",*/
/* "PSI_Q3":"C",*/
/* "PSI_Q5":"B",*/
/* "PSI_Q6":"C",*/
/* "PSI_Q12":"C",*/
/* "PSI_Q4":"D",*/
/* "CD_Q6":["NONE"],*/
/* "secondaryNeeds":["N2C-Package","PSI-Products"],*/
/* "wealthConfirmationCheckbox":"true",*/
/* "AdditionalQ1":"C",*/
/* "AdditionalQ1_Text":"test",*/
/* "showInterestInd":"false"}';*/


string='{ "TotalBorrowers":"1",
"CPAT_Q2":[{"borrower":"primaryBorrower",
"secondaryNeeds":["N2C-Package","PSI-Products"],
"age":67,
"CD_Q2":["C"],
"province":"ON"}]}';
do index=1 to countw(string,'{,}','q');
pair=scan(string,index,'{,}','q');
name=dequote(scan(pair,1,':','q'));
value=scan(pair,2,':','q');
output;
end;
drop index string pair;
run;

and am getting all variables parsed but names of variables in one  column and  their values in  the  other which means i will have to transpose this to get the dataset i want. Transposing not an issue but along with the json formatted fields  the  text file also  has other  fields which will make  transposing difficult, So any chance  :

 

1.
name   value

var1      x

var2      y

var3      z 

can be switched to 

var1 var2  var3

x     y        z                           

 

2. to suppress /remove variables such as CPAT*,CD*,PSI*  from  the parsed  field? I dont need  those

3. I am  seeing "[","]" as names /values  in  "name" and  "value ". Can that be removed?

4.["N2C-Package","PSI-Products"], should be a  value but i see PSI-Products  under  "names"  and i am  sure this got something to do with the comma but no idea  how to  fix  it 

 

 

Tal
Pyrite | Level 9 Tal
Pyrite | Level 9

actually  they just told me , they will only  want two same variables all the time from the jason field so i will use your code,  It  works . Thank you all for  you inputs . Much appreciated 

jimbarbour
Meteorite | Level 14

@Tal,

 

The below code will parse the JSON that you included in your original post.  This code has the advantage that one does not need to know the variable names beforehand.  This is somewhat unsophisticated code in that I'm not checking for embedded commas in either the variable names or the values.  If there are embedded commas, this code would have to be modified.

FILENAME	In_Data	'C:\Users\jbarbou3\Documents\SAS\Pgm\Training\JSON_Parse\JSON_Data.txt';

DATA	WORK.Parsed_Data;
	DROP	_:;

	LENGTH	_JSON_Data	$32767;
	LENGTH	_Txt_Wrk1	$1026;
	LENGTH	Var			$32;
	LENGTH	Value		$1026;
	
	INFILE	In_Data		LENGTH			=	_JSON_Length
						RECFM			=	V
						;

	INPUT	@1	_JSON_Data	$VARYING32767.	_JSON_Length;

	**	The first character is a {, so start in position 2.	**;
	_Position							=	2;

	SUBSTR(_JSON_Data, _JSON_Length, 1)	=	',';

	DO	WHILE	(_Position				<	_JSON_Length);
		_Segment_Length					=	INDEXC(SUBSTR(_JSON_Data, _Position), ',');
		_Txt_Wrk1						=	SUBSTR(_JSON_Data, _Position, (_Segment_Length - 1));
		Var								=	COMPRESS(SUBSTR(_Txt_Wrk1, 1, (INDEXC(_Txt_Wrk1, ':') - 1)), '"');
		Value							=	COMPRESS(SUBSTR(_Txt_Wrk1, (INDEXC(_Txt_Wrk1, ':') + 1)), '"');
		_Position						=	_Position	+	_Segment_Length;
		OUTPUT;
	END;
RUN;

The above code yields the following results:

JSON_Parse_Results_2020-08-31_09-05-32.jpg

 

Jim

jimbarbour
Meteorite | Level 14

Well, I was thinking about this just a bit more.  If one is content with collecting a series of variable names and their values by parsing the JSON data, then my previous reply is sufficient.  

 

However, if one were to desire to have the variable names become separate variable names in a SAS data set, that's a bit more involved.  There are no doubt other ways to do this, but since I rather like macros, I coded the following:

FILENAME	In_Data	'C:\Users\jbarbou3\Documents\SAS\Pgm\Training\JSON_Parse\JSON_Data.txt';

**------------------------------------------------------------------------------**;

DATA	WORK.Parsed_Data;
	DROP	_:;

	LENGTH	_JSON_Data	$32767;
	LENGTH	_Text_Work	$1026;
	LENGTH	_Var_Name	$32;
	LENGTH	Value		$1026;
	
	INFILE	In_Data		LENGTH			=	_JSON_Length
						RECFM			=	V
						;

	INPUT	@1	_JSON_Data	$VARYING32767.	_JSON_Length;

	**	The first character is a {, so start in position 2.	**;
	_Position							=	2;
	_Iteration							=	0;

	SUBSTR(_JSON_Data, _JSON_Length, 1)	=	',';

	DO	WHILE	(_Position				<	_JSON_Length);
		_Iteration						+	1;
		_Segment_Length					=	INDEXC(SUBSTR(_JSON_Data, _Position), ',');
		_Text_Work						=	SUBSTR(_JSON_Data, _Position, (_Segment_Length - 1));
		_Var_Name						=	COMPRESS(SUBSTR(_Text_Work, 1, (INDEXC(_Text_Work, ':') - 1)), '"');
		Value							=	COMPRESS(SUBSTR(_Text_Work, (INDEXC(_Text_Work, ':') + 1)), '"');
		_Position						=	_Position	+	_Segment_Length;
		IF	_N_							=	1	THEN
			CALL	SYMPUTX(CATS('Var_Name', PUT(_Iteration, F3.)), _Var_Name);
		OUTPUT;
	END;

	IF	_N_								=	1	THEN
		CALL	SYMPUTX('Var_Count', PUT(_Iteration, F3.));
RUN;

**------------------------------------------------------------------------------**;

%MACRO	List_Vars(Comma=NO);
	%IF	%QUPCASE(&Comma)				=	YES	%THEN
		%DO;
			%LET	Comma				=	,;
		%END;
	%ELSE
		%DO;
			%LET	Comma				=	;
		%END;

	%LOCAL	i;
	%DO	i								=	1	%TO	&Var_Count;
		%IF	&i							=	&Var_Count	%THEN
			%LET	Comma				=	;
		&&Var_Name&i &Comma
	%END;
%MEND	List_Vars;

**------------------------------------------------------------------------------**;

%MACRO	Equate_Vars;
	%LOCAL	i;
	%DO	i								=	1	%TO	&Var_Count;
		IF	_Iteration					=	&i	THEN
			&&Var_Name&i				=	Value;
	%END;
%MEND	Equate_Vars;

**------------------------------------------------------------------------------**;

DATA	WORK.Final_Data;
	DROP	Value;
	DROP	_:;
	RETAIN	_Iteration;
	RETAIN	%List_Vars;

	SET	WORK.PARSED_DATA;

	_Iteration							+	1;

	IF	_Iteration						>	&Var_Count	THEN
		_Iteration						=	1;

	%Equate_Vars;

	IF	_Iteration						=	&Var_Count	THEN
		DO;
			OUTPUT;
			CALL	MISSING(%List_Vars(Comma=YES));
		END;
RUN;

**------------------------------------------------------------------------------**;

The result of which is a SAS data set with column names parsed out from the JSON data:

JSON_Final_Results_2020-08-31_09-05-32.jpg

This of course assumes that every iteration of the JSON data has the same set of variables.  If all the variable names are not the same, then I think the JSON files would have to be parsed into separate SAS data sets which doesn't sound very practical.

 

Jim

Tal
Pyrite | Level 9 Tal
Pyrite | Level 9

Thanks  jimbarbour,

yea looks like the json field is not consistent in   all the  records (different variables within)

jimbarbour
Meteorite | Level 14

OK.  Well, am I on the right track here?  The code above could be used to take one column out of some data, a column whose contents are JSON formatted, and create SAS variables out of them. 

 

If there are different variable names in another row in that same JSON formatted column, the code could be made to work, but you'd have to have some columns set to missing if not every column is coded in every row of the JSON formatted data.

 

Jim

jimbarbour
Meteorite | Level 14

@Tal,

 

It was an interesting exercise, so I went ahead and wrote code that can create SAS variables based on JSON formatted data even if not all of the same variables are in each instance of JSON formatted data.  If you ever need it, here it is.  🙂 

 

As test data, I added a few more rows.  Some of the additional rows have columns that previous rows do not.  The added columns are:

  1. new
  2. vip
  3. special
  4. promo

Here are the results.  Note that for rows in which the JSON data did not contain a value for a given column that the value of that column is "missing".

JSON_Parse_Results_Improved_2020-09-02_17-10-57.jpg

 

Here's the code.  I hope it comes in handy some time:

%Time_Stamp(START);

**------------------------------------------------------------------------------**;

OPTIONS	VALIDVARNAME					=	V7;

%LET	Cmnt							=	;
%LET	Width							=	75;

**------------------------------------------------------------------------------**;

FILENAME	In_Data	'C:\Users\jbarbou3\Documents\SAS\Pgm\Training\JSON_Parse\JSON_Data.txt';

**------------------------------------------------------------------------------**;

DATA	WORK.Parsed_Data;
	DROP	_:;

	LENGTH	_JSON_Data	$32767;
	LENGTH	_Segment	$1026;
	LENGTH	_Var_Name	$32;
	LENGTH	Row_Nbr		4;
	LENGTH	Var_Nbr		4;
	LENGTH	Value		$1026;

	RETAIN	_Max_Nbr	0;

	IF	_N_								=	1	THEN
		DO;
			CALL	MISSING(Var_Nbr, _Var_Name);
			DECLARE	HASH	Vars(ORDERED: 'A');
			Vars.DEFINEKEY('_Var_Name');
			Vars.DEFINEDATA('Var_Nbr');
			Vars.DEFINEDONE();
		END;

	IF	_End_of_Data							THEN
		DO;
			CALL	SYMPUTX('Var_Count', PUT(_Max_Nbr, F3.), 'G');
		END;
	
	INFILE	In_Data		LENGTH			=	_JSON_Length
						RECFM			=	V
						END				=	_End_of_Data
						;

	INPUT	@1	_JSON_Data	$VARYING32767.	_JSON_Length;

	**	The first character is a {, so start in position 2.	**;
	_Position							=	2;

	SUBSTR(_JSON_Data, _JSON_Length, 1)	=	',';

	DO	WHILE	(_Position				<	_JSON_Length);
		_Segment_Length					=	INDEXC(SUBSTR(_JSON_Data, _Position), ',');
		_Segment						=	SUBSTR(_JSON_Data, _Position, (_Segment_Length - 1));
		_Var_Name						=	COMPRESS(SUBSTR(_Segment, 1, (INDEXC(_Segment, ':') - 1)), '"');
		Value							=	COMPRESS(SUBSTR(_Segment, (INDEXC(_Segment, ':') + 1)), '"');
		_Position						=	_Position	+	_Segment_Length;
		_RC								=	Vars.FIND();
		IF	_RC							=	0	THEN
			DO;
				CALL	SYMPUTX(CATS('Var_Name', PUT(Var_Nbr, F3.)), _Var_Name, 'G');
				IF	Var_Nbr				>	_Max_Nbr	THEN
					_Max_Nbr			=	Var_Nbr;
			END;
		ELSE
			DO;
				_Max_Nbr				+	1;
				CALL	SYMPUTX(CATS('Var_Name', PUT(_Max_Nbr, F3.)), _Var_Name, 'G');
				Var_Nbr					=	_Max_Nbr;
				_RC						=	Vars.ADD();
				IF	_RC					>	0	THEN
					DO;
						PUTLOG	"&Err2  ";
						PUTLOG	"&Err2  %Format_Dashes(&Width)";
						PUTLOG	"&Err1  |  Unable to add variable to Hash table.  "	_RC=;
						PUTLOG	"&Err2  %Format_Dashes(&Width)";
					END;
			END;
		Row_Nbr							=	_N_;
		OUTPUT;
	END;
RUN;

**------------------------------------------------------------------------------**;

&Cmnt.PROC	PRINT	DATA=WORK.PARSED_DATA;
&Cmnt.RUN;

**------------------------------------------------------------------------------**;

%MACRO	List_Vars(Comma=NO);
	%IF	%QUPCASE(&Comma)				=	YES	%THEN
		%DO;
			%LET	Comma				=	,;
		%END;
	%ELSE
		%DO;
			%LET	Comma				=	;
		%END;

	%LOCAL	i;
	%DO	i								=	1	%TO	&Var_Count;
		%IF	&i							=	&Var_Count	%THEN
			%LET	Comma				=	;
		&&Var_Name&i &Comma
	%END;
%MEND	List_Vars;

&Cmnt%PUT	&Nte1  &=Var_Count %List_Vars(Comma=YES);

**------------------------------------------------------------------------------**;

%MACRO	Equate_Vars;
	%LOCAL	i;
	%DO	i								=	1	%TO	&Var_Count;
		IF	Var_Nbr						=	&i	THEN
			&&Var_Name&i				=	Value;
	%END;
%MEND	Equate_Vars;

&Cmnt%PUT	&Nte1  %QUOTE(%Equate_Vars);

**------------------------------------------------------------------------------**;

DATA	WORK.Final_Data;
	DROP	Row_Nbr;
	DROP	Var_Nbr;
	DROP	Value;
	DROP	_:;

	RETAIN	%List_Vars;

	SET	WORK.PARSED_DATA;
		BY	Row_Nbr;

	%Equate_Vars;

	IF	LAST.Row_Nbr						THEN
		DO;
			OUTPUT;
			CALL	MISSING(%List_Vars(Comma=YES));
		END;
RUN;

**------------------------------------------------------------------------------**;

&Cmnt.PROC	PRINT	DATA=WORK.FINAL_DATA;
&Cmnt.RUN;

**------------------------------------------------------------------------------**;

%Time_Stamp(STOP);

 

Jim

FrodeHK
Calcite | Level 5

Hi @jimbarbour .

This solution looks exactly what i need for my project. However I have the json string in a column in a SAS table.

I'm very novice SAS Programmer.
Is there an easy way to modify the code. Table x has 4 columns where column 3 (JData) is the JSON string.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 16 replies
  • 9352 views
  • 8 likes
  • 7 in conversation