SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Parsing a string variable

Accepted Solution Solved
Reply
Contributor
Posts: 58
Accepted Solution

Parsing a string variable

Hi SAS Community,

I am looking to parse a string variable and extract several variables from the information contained in it. Listed below is an example of the string data:

Variable_data

{"calories":"153","steps":"87","distance":"0.0578725741524","floors":"0","elevation":"0","activeScore":"-1"}

I want the output data set with additional variables parsed from the string (shown below with variable names on top):

Calories  Steps  Distance  Floors  Elevation  activeScore

153  87  0.0578725741524  0  0  -1

There is a second layer of data management to this. In some cases there are two data points in one row for the string variable (example shown below)

[{"calories":"153","steps":"87","distance":"0.0578725741524","floors":"0","elevation":"0","activeScore":"-1"}, {"calories":"159","steps":"77","distance":"0.0498723741524","floors":"0","elevation":"0","activeScore":"-1"}]

Is it possible to parse the data as two separate rows in this case?

Any help will be much appreciated.

Best,

Pronabesh


Accepted Solutions
Solution
‎07-11-2014 02:03 PM
Contributor
Posts: 58

Re: Parsing a string variable

Thank All,

I was actually trying to avoid the datalines option because the file was very large. The following infile and input statement code worked.

filename a "C:\Users\Pronabesh\Desktop\PLM_Stats\Biogen Devices\Data analysis\test.txt"; *This is any file, you just need something with a row or two in it;

data want;

set have;

infile a dlm=', "';

input @@;

_infile_=data;

input

@'"calories":' calories $

@'"steps":' steps $

@'"distance":' distance $;

put calories= steps= distance=;

run;

View solution in original post


All Replies
Respected Advisor
Posts: 3,902

Re: Parsing a string variable

This looks like JSON to me. If so then a Google search using something like "site:support.sas.com JSON" will return quite a few relevant links like:

http://support.sas.com/resources/papers/proceedings13/296-2013.pdf

http://support.sas.com/resources/papers/proceedings13/342-2013.pdf

Respected Advisor
Posts: 3,124

Re: Parsing a string variable

This may get you started,

data have;

     infile cards dlm='}]';

     input var : $ 150. @@;

     if not missing (var);

     array v  calories steps distance floors elevation activeScore;

     format distance best32.20;

     do over v;

           v=prxchange('s/.+'||vname(v)||'":"(-\d+|\d+|\d\.\d+)".*/$1/i',-1,var);

     end;

     drop var;

     cards;

{"calories":"152","steps":"87","distance":"0.0578725741524","floors":"0","elevation":"0","activeScore":"-1"}

[{"calories":"153","steps":"87","distance":"0.0578725741524","floors":"0","elevation":"0","activeScore":"-1"}, {"calories":"159","steps":"77","distance":"0.0498723741524","floors":"0","elevation":"0","activeScore":"-1"}]

;


Haikuo

Super Contributor
Posts: 275

Re: Parsing a string variable

data have;

     infile cards truncover dlm=',';

     input @;

  _infile_=compress(_infile_,'-.,','kd');

  input calories steps distance floors elevation activeScore;

     cards;

[{"calories":"153","steps":"87","distance":"0.0578725741524","floors":"0","elevation":"0","activeScore":"-1"}

{"calories":"159","steps":"77","distance":"0.0498723741524","floors":"0","elevation":"0","activeScore":"-1"}]

;

run;

Super User
Posts: 9,691

Re: Parsing a string variable

Your JSON file contains JSON ARRAY, that would make problem more complicated . My suggest is like Patrick to use proc json .

Respected Advisor
Posts: 3,902

Re: Parsing a string variable

I believe PROC JSON is only for writing a SAS data set to a JSON file. You will need a SAS data step for reading a JSON file.

Super User
Posts: 9,691

Re: Parsing a string variable

Oh,Patrick That is disappoint about SAS for not adding such function in proc json .

If there were no JSON Object ,that would be easy, but considering about json object that would make problem complicated .

Respected Advisor
Posts: 3,902

Re: Parsing a string variable

Just stumbled over this link How to import Twitter tweets in SAS DATA Step using OAuth 2 authentication style - SAS Voices and thought this might interest you.

Super User
Posts: 9,691

Re: Parsing a string variable

Hi Patrick,

Sorry . Firstly, I can't enter Twitter url , Chinese Government have already block them all . Secondly, I totally have no idea about it , OAuth 2 is a completely different thing other than general url , Maybe you should check SAS documentation detail like proc http or something else , or talk some SAS guys to see if SAS has already support this kind of connection .

OH, I know what do you mean. proc groovy can import JSON file . That will be nice .

Regards

Xia Keshan

Solution
‎07-11-2014 02:03 PM
Contributor
Posts: 58

Re: Parsing a string variable

Thank All,

I was actually trying to avoid the datalines option because the file was very large. The following infile and input statement code worked.

filename a "C:\Users\Pronabesh\Desktop\PLM_Stats\Biogen Devices\Data analysis\test.txt"; *This is any file, you just need something with a row or two in it;

data want;

set have;

infile a dlm=', "';

input @@;

_infile_=data;

input

@'"calories":' calories $

@'"steps":' steps $

@'"distance":' distance $;

put calories= steps= distance=;

run;

Respected Advisor
Posts: 3,902

Re: Parsing a string variable

If the data structure in the JSON file is orthogonal and you will always get all key:value pairs then you could try something like below:

data testdata;

  infile datalines truncover dsd dlm=',}' COLUMN=point_pos;

  point_pos=1;

  input @;

  row_id=_n_;

  do while(point_pos+10 < lengthn(_infile_));

    input @point_pos

      @'"calories":' calories :32.

      @'"steps":' steps :32.

      @'"distance":' distance :32.

      @'"floors":' floors :32.

      @'"elevation":' elevation :32.

      @'"activeScore":' activeScore :32.

      @

    ;

    output;

  end;

  datalines;

{"calories":"153","steps":"87","distance":"0.0578725741524","floors":"0","elevation":"0","activeScore":"-1"}

[{"calories":"153","steps":"87","distance":"0.0578725741524","floors":"0","elevation":"0","activeScore":"-1"}, {"calories":"159","steps":"77","distance":"0.0498723741524","floors":"0","elevation":"0","activeScore":"-1"}]

;

run;

Contributor
Posts: 58

Re: Parsing a string variable

Thanks Patrick and Ksharp. As you guys pointed out, proc JSON is only for exporting SAS files to JSON.

@Patrick: Your code works if I use the datalines options. However, I have a very large datafile in which the json object is just one variable. There are other variables linked to the json object via unique identifier (example shown below). Is there a way to parse it in this case?

49999{"calories":"1736","steps":"4024","distance":"1.79278637687032","floors":"0","elevation":"0","activeScore":"-1"}activity
51111{"calories":"1583","steps":"85","distance":"0.0378725741524","floors":"0","elevation":"0","activeScore":"-1"}activity
Contributor
Posts: 58

Re: Parsing a string variable

the variables are

obs id data activity

49999{"calories":"1736","steps":"4024","distance":"1.79278637687032","floors":"0","elevation":"0","activeScore":"-1"}activity
51111{"calories":"1583","steps":"85","distance":"0.0378725741524","floors":"0","elevation":"0","activeScore":"-1"}activity
Respected Advisor
Posts: 3,902

Re: Parsing a string variable

One way to go:

data testdata(drop=_Smiley Happy;

  infile datalines truncover dlm='"[{,}]' COLUMN=point_pos;

  point_pos=1;

  attrib

    id data informat=32.

    activity informat=$20.

    ;

  input _lead_str :$30. @;

  if not missing(_lead_str) then

    do;

      id=input(scan(_lead_str,1,' '),32.);

      data=input(scan(_lead_str,2,' '),32.);

    end;

  activity=scan(_infile_,-1,'}] ');

  row_id=_n_;

  do while(point_pos+20 < lengthn(_infile_));

    input @point_pos

      @'"calories":' calories :32.

      @'"steps":' steps :32.

      @'"distance":' distance :32.

      @'"floors":' floors :32.

      @'"elevation":' elevation :32.

      @'"activeScore":' activeScore :32.

      @

    ;

    output;

  end;

  datalines;

4 999 {"calories":"153","steps":"87","distance":"0.0578725741524","floors":"0","elevation":"0","activeScore":"-1"} activity

5 1111 [{"calories":"153","steps":"87","distance":"0.0578725741524","floors":"0","elevation":"0","activeScore":"-1"}, {"calories":"159","steps":"77","distance":"0.0498723741524","floors":"0","elevation":"0","activeScore":"-1"}] activity

;

run;

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 13 replies
  • 763 views
  • 1 like
  • 5 in conversation