Hi SAS Community,
I am looking to parse a string variable and extract several variables from the information contained in it. Listed below is an example of the string data:
Variable_data
{"calories":"153","steps":"87","distance":"0.0578725741524","floors":"0","elevation":"0","activeScore":"-1"}
I want the output data set with additional variables parsed from the string (shown below with variable names on top):
Calories Steps Distance Floors Elevation activeScore
153 87 0.0578725741524 0 0 -1
There is a second layer of data management to this. In some cases there are two data points in one row for the string variable (example shown below)
[{"calories":"153","steps":"87","distance":"0.0578725741524","floors":"0","elevation":"0","activeScore":"-1"}, {"calories":"159","steps":"77","distance":"0.0498723741524","floors":"0","elevation":"0","activeScore":"-1"}]
Is it possible to parse the data as two separate rows in this case?
Any help will be much appreciated.
Best,
Pronabesh
Thank All,
I was actually trying to avoid the datalines option because the file was very large. The following infile and input statement code worked.
filename a "C:\Users\Pronabesh\Desktop\PLM_Stats\Biogen Devices\Data analysis\test.txt"; *This is any file, you just need something with a row or two in it;
data want;
set have;
infile a dlm=', "';
input @@;
_infile_=data;
input
@'"calories":' calories $
@'"steps":' steps $
@'"distance":' distance $;
put calories= steps= distance=;
run;
This looks like JSON to me. If so then a Google search using something like "site:support.sas.com JSON" will return quite a few relevant links like:
http://support.sas.com/resources/papers/proceedings13/296-2013.pdf
http://support.sas.com/resources/papers/proceedings13/342-2013.pdf
This may get you started,
data have;
infile cards dlm='}]';
input var : $ 150. @@;
if not missing (var);
array v calories steps distance floors elevation activeScore;
format distance best32.20;
do over v;
v=prxchange('s/.+'||vname(v)||'":"(-\d+|\d+|\d\.\d+)".*/$1/i',-1,var);
end;
drop var;
cards;
{"calories":"152","steps":"87","distance":"0.0578725741524","floors":"0","elevation":"0","activeScore":"-1"}
[{"calories":"153","steps":"87","distance":"0.0578725741524","floors":"0","elevation":"0","activeScore":"-1"}, {"calories":"159","steps":"77","distance":"0.0498723741524","floors":"0","elevation":"0","activeScore":"-1"}]
;
Haikuo
data have;
infile cards truncover dlm=',';
input @;
_infile_=compress(_infile_,'-.,','kd');
input calories steps distance floors elevation activeScore;
cards;
[{"calories":"153","steps":"87","distance":"0.0578725741524","floors":"0","elevation":"0","activeScore":"-1"}
{"calories":"159","steps":"77","distance":"0.0498723741524","floors":"0","elevation":"0","activeScore":"-1"}]
;
run;
Your JSON file contains JSON ARRAY, that would make problem more complicated . My suggest is like Patrick to use proc json .
Oh,Patrick That is disappoint about SAS for not adding such function in proc json .
If there were no JSON Object ,that would be easy, but considering about json object that would make problem complicated .
Just stumbled over this link How to import Twitter tweets in SAS DATA Step using OAuth 2 authentication style - SAS Voices and thought this might interest you.
Hi Patrick,
Sorry . Firstly, I can't enter Twitter url , Chinese Government have already block them all . Secondly, I totally have no idea about it , OAuth 2 is a completely different thing other than general url , Maybe you should check SAS documentation detail like proc http or something else , or talk some SAS guys to see if SAS has already support this kind of connection .
OH, I know what do you mean. proc groovy can import JSON file . That will be nice .
Regards
Xia Keshan
Thank All,
I was actually trying to avoid the datalines option because the file was very large. The following infile and input statement code worked.
filename a "C:\Users\Pronabesh\Desktop\PLM_Stats\Biogen Devices\Data analysis\test.txt"; *This is any file, you just need something with a row or two in it;
data want;
set have;
infile a dlm=', "';
input @@;
_infile_=data;
input
@'"calories":' calories $
@'"steps":' steps $
@'"distance":' distance $;
put calories= steps= distance=;
run;
If the data structure in the JSON file is orthogonal and you will always get all key:value pairs then you could try something like below:
data testdata;
infile datalines truncover dsd dlm=',}' COLUMN=point_pos;
point_pos=1;
input @;
row_id=_n_;
do while(point_pos+10 < lengthn(_infile_));
input @point_pos
@'"calories":' calories :32.
@'"steps":' steps :32.
@'"distance":' distance :32.
@'"floors":' floors :32.
@'"elevation":' elevation :32.
@'"activeScore":' activeScore :32.
@
;
output;
end;
datalines;
{"calories":"153","steps":"87","distance":"0.0578725741524","floors":"0","elevation":"0","activeScore":"-1"}
[{"calories":"153","steps":"87","distance":"0.0578725741524","floors":"0","elevation":"0","activeScore":"-1"}, {"calories":"159","steps":"77","distance":"0.0498723741524","floors":"0","elevation":"0","activeScore":"-1"}]
;
run;
Thanks Patrick and Ksharp. As you guys pointed out, proc JSON is only for exporting SAS files to JSON.
@Patrick: Your code works if I use the datalines options. However, I have a very large datafile in which the json object is just one variable. There are other variables linked to the json object via unique identifier (example shown below). Is there a way to parse it in this case?
4 | 9999 | {"calories":"1736","steps":"4024","distance":"1.79278637687032","floors":"0","elevation":"0","activeScore":"-1"} | activity |
---|---|---|---|
5 | 1111 | {"calories":"1583","steps":"85","distance":"0.0378725741524","floors":"0","elevation":"0","activeScore":"-1"} | activity |
the variables are
obs id data activity
4 | 9999 | {"calories":"1736","steps":"4024","distance":"1.79278637687032","floors":"0","elevation":"0","activeScore":"-1"} | activity |
---|---|---|---|
5 | 1111 | {"calories":"1583","steps":"85","distance":"0.0378725741524","floors":"0","elevation":"0","activeScore":"-1"} | activity |
One way to go:
data testdata(drop=_:);
infile datalines truncover dlm='"[{,}]' COLUMN=point_pos;
point_pos=1;
attrib
id data informat=32.
activity informat=$20.
;
input _lead_str :$30. @;
if not missing(_lead_str) then
do;
id=input(scan(_lead_str,1,' '),32.);
data=input(scan(_lead_str,2,' '),32.);
end;
activity=scan(_infile_,-1,'}] ');
row_id=_n_;
do while(point_pos+20 < lengthn(_infile_));
input @point_pos
@'"calories":' calories :32.
@'"steps":' steps :32.
@'"distance":' distance :32.
@'"floors":' floors :32.
@'"elevation":' elevation :32.
@'"activeScore":' activeScore :32.
@
;
output;
end;
datalines;
4 999 {"calories":"153","steps":"87","distance":"0.0578725741524","floors":"0","elevation":"0","activeScore":"-1"} activity
5 1111 [{"calories":"153","steps":"87","distance":"0.0578725741524","floors":"0","elevation":"0","activeScore":"-1"}, {"calories":"159","steps":"77","distance":"0.0498723741524","floors":"0","elevation":"0","activeScore":"-1"}] activity
;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.