Hi SAS Community,
I am looking to parse a string variable and extract several variables from the information contained in it. Listed below is an example of the string data:
Variable_data
{"calories":"153","steps":"87","distance":"0.0578725741524","floors":"0","elevation":"0","activeScore":"-1"}
I want the output data set with additional variables parsed from the string (shown below with variable names on top):
Calories Steps Distance Floors Elevation activeScore
153 87 0.0578725741524 0 0 -1
There is a second layer of data management to this. In some cases there are two data points in one row for the string variable (example shown below)
[{"calories":"153","steps":"87","distance":"0.0578725741524","floors":"0","elevation":"0","activeScore":"-1"}, {"calories":"159","steps":"77","distance":"0.0498723741524","floors":"0","elevation":"0","activeScore":"-1"}]
Is it possible to parse the data as two separate rows in this case?
Any help will be much appreciated.
Best,
Pronabesh
Thank All,
I was actually trying to avoid the datalines option because the file was very large. The following infile and input statement code worked.
filename a "C:\Users\Pronabesh\Desktop\PLM_Stats\Biogen Devices\Data analysis\test.txt"; *This is any file, you just need something with a row or two in it;
data want;
set have;
infile a dlm=', "';
input @@;
_infile_=data;
input
@'"calories":' calories $
@'"steps":' steps $
@'"distance":' distance $;
put calories= steps= distance=;
run;
This looks like JSON to me. If so then a Google search using something like "site:support.sas.com JSON" will return quite a few relevant links like:
http://support.sas.com/resources/papers/proceedings13/296-2013.pdf
http://support.sas.com/resources/papers/proceedings13/342-2013.pdf
This may get you started,
data have;
infile cards dlm='}]';
input var : $ 150. @@;
if not missing (var);
array v calories steps distance floors elevation activeScore;
format distance best32.20;
do over v;
v=prxchange('s/.+'||vname(v)||'":"(-\d+|\d+|\d\.\d+)".*/$1/i',-1,var);
end;
drop var;
cards;
{"calories":"152","steps":"87","distance":"0.0578725741524","floors":"0","elevation":"0","activeScore":"-1"}
[{"calories":"153","steps":"87","distance":"0.0578725741524","floors":"0","elevation":"0","activeScore":"-1"}, {"calories":"159","steps":"77","distance":"0.0498723741524","floors":"0","elevation":"0","activeScore":"-1"}]
;
Haikuo
data have;
infile cards truncover dlm=',';
input @;
_infile_=compress(_infile_,'-.,','kd');
input calories steps distance floors elevation activeScore;
cards;
[{"calories":"153","steps":"87","distance":"0.0578725741524","floors":"0","elevation":"0","activeScore":"-1"}
{"calories":"159","steps":"77","distance":"0.0498723741524","floors":"0","elevation":"0","activeScore":"-1"}]
;
run;
Your JSON file contains JSON ARRAY, that would make problem more complicated . My suggest is like Patrick to use proc json .
Oh,Patrick That is disappoint about SAS for not adding such function in proc json .
If there were no JSON Object ,that would be easy, but considering about json object that would make problem complicated .
Just stumbled over this link How to import Twitter tweets in SAS DATA Step using OAuth 2 authentication style - SAS Voices and thought this might interest you.
Hi Patrick,
Sorry . Firstly, I can't enter Twitter url , Chinese Government have already block them all . Secondly, I totally have no idea about it , OAuth 2 is a completely different thing other than general url , Maybe you should check SAS documentation detail like proc http or something else , or talk some SAS guys to see if SAS has already support this kind of connection .
OH, I know what do you mean. proc groovy can import JSON file . That will be nice .
Regards
Xia Keshan
Thank All,
I was actually trying to avoid the datalines option because the file was very large. The following infile and input statement code worked.
filename a "C:\Users\Pronabesh\Desktop\PLM_Stats\Biogen Devices\Data analysis\test.txt"; *This is any file, you just need something with a row or two in it;
data want;
set have;
infile a dlm=', "';
input @@;
_infile_=data;
input
@'"calories":' calories $
@'"steps":' steps $
@'"distance":' distance $;
put calories= steps= distance=;
run;
If the data structure in the JSON file is orthogonal and you will always get all key:value pairs then you could try something like below:
data testdata;
infile datalines truncover dsd dlm=',}' COLUMN=point_pos;
point_pos=1;
input @;
row_id=_n_;
do while(point_pos+10 < lengthn(_infile_));
input @point_pos
@'"calories":' calories :32.
@'"steps":' steps :32.
@'"distance":' distance :32.
@'"floors":' floors :32.
@'"elevation":' elevation :32.
@'"activeScore":' activeScore :32.
@
;
output;
end;
datalines;
{"calories":"153","steps":"87","distance":"0.0578725741524","floors":"0","elevation":"0","activeScore":"-1"}
[{"calories":"153","steps":"87","distance":"0.0578725741524","floors":"0","elevation":"0","activeScore":"-1"}, {"calories":"159","steps":"77","distance":"0.0498723741524","floors":"0","elevation":"0","activeScore":"-1"}]
;
run;
Thanks Patrick and Ksharp. As you guys pointed out, proc JSON is only for exporting SAS files to JSON.
@Patrick: Your code works if I use the datalines options. However, I have a very large datafile in which the json object is just one variable. There are other variables linked to the json object via unique identifier (example shown below). Is there a way to parse it in this case?
4 | 9999 | {"calories":"1736","steps":"4024","distance":"1.79278637687032","floors":"0","elevation":"0","activeScore":"-1"} | activity |
---|---|---|---|
5 | 1111 | {"calories":"1583","steps":"85","distance":"0.0378725741524","floors":"0","elevation":"0","activeScore":"-1"} | activity |
the variables are
obs id data activity
4 | 9999 | {"calories":"1736","steps":"4024","distance":"1.79278637687032","floors":"0","elevation":"0","activeScore":"-1"} | activity |
---|---|---|---|
5 | 1111 | {"calories":"1583","steps":"85","distance":"0.0378725741524","floors":"0","elevation":"0","activeScore":"-1"} | activity |
One way to go:
data testdata(drop=_:);
infile datalines truncover dlm='"[{,}]' COLUMN=point_pos;
point_pos=1;
attrib
id data informat=32.
activity informat=$20.
;
input _lead_str :$30. @;
if not missing(_lead_str) then
do;
id=input(scan(_lead_str,1,' '),32.);
data=input(scan(_lead_str,2,' '),32.);
end;
activity=scan(_infile_,-1,'}] ');
row_id=_n_;
do while(point_pos+20 < lengthn(_infile_));
input @point_pos
@'"calories":' calories :32.
@'"steps":' steps :32.
@'"distance":' distance :32.
@'"floors":' floors :32.
@'"elevation":' elevation :32.
@'"activeScore":' activeScore :32.
@
;
output;
end;
datalines;
4 999 {"calories":"153","steps":"87","distance":"0.0578725741524","floors":"0","elevation":"0","activeScore":"-1"} activity
5 1111 [{"calories":"153","steps":"87","distance":"0.0578725741524","floors":"0","elevation":"0","activeScore":"-1"}, {"calories":"159","steps":"77","distance":"0.0498723741524","floors":"0","elevation":"0","activeScore":"-1"}] activity
;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.