Parsing json data

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 15
Accepted Solution

Parsing json data

Hi, I'm trying to get the attached data into SAS in order to analyse it alongside another data set. I don't have any experience working with json formatted data, so am having a lot of difficulty trying to get it in in a usable format. Trawled through the boards and found this solution (with edits for my data structure):

 

data x;
infile 'C:\Desktop\psc.txt' dsd lrecl=30000000 dlm='{[:,';
input x : $2000.@@;
run;
data temp;
merge x x(firstobs=2 rename=(x=_x));
if lowcase(x) in ("company_number","data__address__address_line_1","data__address__address_line_2","data__address__care_of","data__address__country","data__address__locality","data__address__po_box","data__address__postal_code","data__address__premises","data__address__region","data__ceased_on","data__country_of_residence","data__date_of_birth__month","data__date_of_birth__year","data__description","data__etag","data__exemptions__psc_exempt_as_shares_admitted_on_market__exemption_type","data__exemptions__psc_exempt_as_shares_admitted_on_market__items","data__exemptions__psc_exempt_as_trading_on_regulated_market__exemption_type","data__exemptions__psc_exempt_as_trading_on_regulated_market__items","data__exemptions_count","data__generated_at","data__identification__country_registered","data__identification__legal_authority","data__identification__legal_form","data__identification__place_registered","data__identification__registration_number","data__kind","data__links__self","data__name","data__name_elements__forename","data__name_elements__honours","data__name_elements__middle_name","data__name_elements__surname","data__name_elements__title","data__nationality","data__natures_of_control","data__notified_on","data__persons_of_significant_control_count","data__statement","data__statements_count");
run;
data temp;
set temp;
if lowcase(x)='company_number' then group+1;
run;

proc transpose data=temp out=want(drop=_Smiley Happy;
by group;
id x;
var _x;
run;

 

Currently the ouput is a series 'group', and then the company numbers that have been successfully extracted. I'm a beginner with SAS programming language, but it seems like the other sets of data aren't defined in the code, but I'm really stuck with how to fix it.

 

Thanks, Ian


Accepted Solutions
Solution
‎10-05-2017 08:26 AM
SAS Employee
Posts: 37

Re: Parsing json data

I found the JSON in the attached psc.txt file is invalid. I modified the file as I think it should be for what you appear to desire (and the modified file is attached to this post). With a valid JSON file, I ran the JSON LIBNAME engine (available in SAS 9.4 maintenance release 4) to produce 4 SAS data sets. I then used the SQL procedure to combine the SAS data sets into records similar to the JSON data. This demonstrates how to reassemble the hierarchical data in the JSON data file that was flatten into SAS data sets. The complete SAS program is included and shows how easy it is to import JSON data into SAS data set using the JSON LIBNAME engine.

 

libname x5 json 'psc5.txt';
proc datasets lib=x5; 
     quit;
proc contents data=x5._all_; 
     run;

proc sql;
     create table work.records as
     select root.company_number, data.etag, data.kind, data_links.self,
            data.notified_on, data.statement, data.ceased_on
     from x5.root as root, X5.DATA_LINKS as DATA_LINKS, X5.DATA as data
     where root.ordinal_root = data.ordinal_root and
           data.ordinal_data = data_links.ordinal_data
           ;
     quit;
proc print data=work.records(obs=5); 
     run;

 

 

    company_
Obs  number                       etag

  1 10869580    f327a09d5a3a08af72b03211f3a2e148814d75dd
  2 10869582    83f75b47516bb1c7bac875c7f97134cb51711f2d
  3 10869588    e730a688879c2f724e8c881267497f9b10ba2688
  4 10869591    83dcecd921dee3f8099bd90a4d355b104ffacc9f
  5 10869596    f570d6220979853777d6bad1418cafd5a4f7c418


Obs                    kind

  1 persons-with-significant-control-statement
  2 persons-with-significant-control-statement
  3 persons-with-significant-control-statement
  4 persons-with-significant-control-statement
  5 persons-with-significant-control-statement


Obs                                    self

  1 /company/10869580/persons-with-significant-control-statements/JZsKO5aylMYj
  2 /company/10869582/persons-with-significant-control-statements/5xQ7ZCMaLDHO
  3 /company/10869588/persons-with-significant-control-statements/rHluQkBUwwXw
  4 /company/10869591/persons-with-significant-control-statements/rK8w6OtlgY9f
  5 /company/10869596/persons-with-significant-control-statements/w1mzlYcY-BoB

    notified_                                                      ceased_
Obs     on                          statement                        on

  1 2017-07-17   no-individual-or-entity-with-signficant-control
  2 2017-07-17   no-individual-or-entity-with-signficant-control
  3 2017-07-17   no-individual-or-entity-with-signficant-control
  4 2017-07-17   no-individual-or-entity-with-signficant-control
  5 2017-07-17   no-individual-or-entity-with-signficant-control

 

NOTE: There were 5 observations read from the data set WORK.RECORDS.
NOTE: PROCEDURE PRINT used (Total process time):
      real time           0.04 seconds
      cpu time            0.00 seconds

View solution in original post


All Replies
Super Contributor
Posts: 508

Re: Parsing json data

This is a bit of a tough ask if you're a beginner in SAS - in my experience the best way to parse JSON is to use the Proc DS2 JSON parser. There's a good example in this artice https://blogs.sas.com/content/sasdummy/2015/09/28/parse-json-from-sas/

 

It looks like your JSON is fairly straightforward so if you give it a try and post again if you have any problems.

Occasional Contributor
Posts: 15

Re: Parsing json data

Posted in reply to ChrisBrooks

Think that solution's a bit beyond me as well! I'll have an ask around at work to see whether anyone else can give a hand.

 

Thanks for you help, Ian

Frequent Contributor
Posts: 136

Re: Parsing json data

See if this helps:

 

https://blogs.sas.com/content/sasdummy/2016/12/02/json-libname-engine-sas/?postid=710499380

 

I have not tried this code yet since I normally use JSON engines outside of SAS but 9.4M4 has a new libname engine for JSON.

Super User
Posts: 10,313

Re: Parsing json data

Yeah. The best to solve this kind of problem is using JSON engine of LIBNAME statement, if you have SAS 9.4 m4.

 

filename x 'c:\temp\psc.txt';
data x;
 infile x recfm=n dlm='{},';
 input x : $2000.@@;
 a=dequote(scan(x,1,':'));
 b=dequote(scan(x,-1,':','m'));
 if anyspace(x)=1 then do;group+1;delete;end;
run;
proc transpose data=x out=want;
by group;
id a;
var b;
run;
Occasional Contributor
Posts: 15

Re: Parsing json data

This seems to have worked! Thanks so much.

Solution
‎10-05-2017 08:26 AM
SAS Employee
Posts: 37

Re: Parsing json data

I found the JSON in the attached psc.txt file is invalid. I modified the file as I think it should be for what you appear to desire (and the modified file is attached to this post). With a valid JSON file, I ran the JSON LIBNAME engine (available in SAS 9.4 maintenance release 4) to produce 4 SAS data sets. I then used the SQL procedure to combine the SAS data sets into records similar to the JSON data. This demonstrates how to reassemble the hierarchical data in the JSON data file that was flatten into SAS data sets. The complete SAS program is included and shows how easy it is to import JSON data into SAS data set using the JSON LIBNAME engine.

 

libname x5 json 'psc5.txt';
proc datasets lib=x5; 
     quit;
proc contents data=x5._all_; 
     run;

proc sql;
     create table work.records as
     select root.company_number, data.etag, data.kind, data_links.self,
            data.notified_on, data.statement, data.ceased_on
     from x5.root as root, X5.DATA_LINKS as DATA_LINKS, X5.DATA as data
     where root.ordinal_root = data.ordinal_root and
           data.ordinal_data = data_links.ordinal_data
           ;
     quit;
proc print data=work.records(obs=5); 
     run;

 

 

    company_
Obs  number                       etag

  1 10869580    f327a09d5a3a08af72b03211f3a2e148814d75dd
  2 10869582    83f75b47516bb1c7bac875c7f97134cb51711f2d
  3 10869588    e730a688879c2f724e8c881267497f9b10ba2688
  4 10869591    83dcecd921dee3f8099bd90a4d355b104ffacc9f
  5 10869596    f570d6220979853777d6bad1418cafd5a4f7c418


Obs                    kind

  1 persons-with-significant-control-statement
  2 persons-with-significant-control-statement
  3 persons-with-significant-control-statement
  4 persons-with-significant-control-statement
  5 persons-with-significant-control-statement


Obs                                    self

  1 /company/10869580/persons-with-significant-control-statements/JZsKO5aylMYj
  2 /company/10869582/persons-with-significant-control-statements/5xQ7ZCMaLDHO
  3 /company/10869588/persons-with-significant-control-statements/rHluQkBUwwXw
  4 /company/10869591/persons-with-significant-control-statements/rK8w6OtlgY9f
  5 /company/10869596/persons-with-significant-control-statements/w1mzlYcY-BoB

    notified_                                                      ceased_
Obs     on                          statement                        on

  1 2017-07-17   no-individual-or-entity-with-signficant-control
  2 2017-07-17   no-individual-or-entity-with-signficant-control
  3 2017-07-17   no-individual-or-entity-with-signficant-control
  4 2017-07-17   no-individual-or-entity-with-signficant-control
  5 2017-07-17   no-individual-or-entity-with-signficant-control

 

NOTE: There were 5 observations read from the data set WORK.RECORDS.
NOTE: PROCEDURE PRINT used (Total process time):
      real time           0.04 seconds
      cpu time            0.00 seconds

Super User
Super User
Posts: 7,393

Re: Parsing json data

Posted in reply to BillM_SAS

What was wrong with the original file and what did you do to fix it? 

Did you just wrap the whole thing in [] with commas?

So something like this?

data _null_;
   infile 'badfile' end=eof;
   file 'goodfile' ;
   retain sep '[' ;
   input ;
   put sep _infile_ ;
   if eof then put ']';
   sep=',';
run;
SAS Employee
Posts: 37

Re: Parsing json data

Posted in reply to BillM_SAS

The JSON LIBNAME engine did not like the JSON in the psc.txt file originally posted. I then ran that JSON file through a JSON pretty printer and it also noted the file was invalid. So yes, after I examined the file I added commas and wrapped the contents of the whole file with brackets to make it a valid JSON file. 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 726 views
  • 2 likes
  • 6 in conversation