BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
BillJones
Calcite | Level 5

Hello,

I'm trying to import a seemingly simple XML file into SAS with the XML mapper utility.  Note I used the XML mapper successfully on what I thought was a much more complex xml file without any issue.  Perhaps I'm missing some simple tweak that will import the data.  I want import a table with two variables: stock symbol and option expiration date.  If there's more than one expiration date, then the symbol should be repeated. 

Here's the xml file that I want to import:

This XML file does not appear to have any style information associated with it. The document tree is shown below.

<query xmlns:yahoo="http://www.yahooapis.com/v1/base.rng" yahoo:count="2" yahoo:created="2014-05-04T12:06:45Z" yahoo:lang="en-US">

<diagnostics>

<url execution-start-time="0" execution-stop-time="1" execution-time="1">

</url>

<url execution-start-time="1" execution-stop-time="2" execution-time="1">

</url>

<publiclyCallable>true</publiclyCallable>

<url execution-start-time="5" execution-stop-time="825" execution-time="820">

</url>

<javascript execution-start-time="4" execution-stop-time="845" execution-time="840" instructions-used="14243" table-name="yahoo.finance.option_contracts"/>

<url execution-start-time="5" execution-stop-time="939" execution-time="934">

</url>

<javascript execution-start-time="4" execution-stop-time="958" execution-time="954" instructions-used="25640" table-name="yahoo.finance.option_contracts"/>

<user-time>959</user-time>

<service-time>1640</service-time>

<build-version>0.2.2467</build-version>

</diagnostics>

<results>

<option symbol="yhoo">

<contract>2014-05</contract>

<contract>2014-06</contract>

<contract>2014-07</contract>

<contract>2014-10</contract>

<contract>2015-01</contract>

<contract>2016-01</contract>

</option>

<option symbol="fb">

<contract>2014-05</contract>

<contract>2014-06</contract>

<contract>2014-07</contract>

<contract>2014-09</contract>

<contract>2014-12</contract>

<contract>2015-01</contract>

<contract>2016-01</contract>

</option>

</results>

</query>

<!-- total: 959 -->

<!-- engine5.yql.gq1.yahoo.com -->


url to the above xml: https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20yahoo.finance.option_contracts%20WHE...

So, given the above I hope to get a table that looks like:

Symbol      Expiration

yhoo          2014-05

yhoo          2014-06

yhoo          2014-07

yhoo          2014-10

yhoo          2015-01

yhoo          2016-01

fb              2014-05

fb              2014-06

fb              2014-07

fb              2014-09

fb              2014-12

fb              2015-01

fb              2016-01

When I try to import the data using the XML mapper, I get an error stating that the " XMLMap contains no tables".  And when I try to create a custom map I only get the last record of the desired table above.    Is there a simple adjustment I can make to import the data?  Thank you so much in advance for any help or suggestions.

-Bill

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

It looks like this xml is a stream file .Try this one.

 
data temp;
infile 'c:\temp\yahoo.txt' recfm=n dlm='<>';
length symbol expiration $ 20;
retain symbol;
input token : $200. @@;
if prxmatch('/(^\d{4}-\d{2})|(option\s+symbol=)/o',token) then do;
  if left(token) =: 'option' then  do;symbol=scan(token,2,'"'); delete;end;
  expiration=token;
  output;
 end;
 drop token;
run;

filename x url 'http://...........' ;
data temp;
infile x recfm=n dlm='<>';
length symbol expiration $ 20;
retain symbol;
input token : $200. @@;
if prxmatch('/(^\d{4}-\d{2})|(option\s+symbol=)/o',token) then do;
  if left(token) =: 'option' then  do;symbol=scan(token,2,'"'); delete;end;
  expiration=token;
  output;
 end;
 drop token;
run;

Xia Keshan

Message was edited by: xia keshan

View solution in original post

15 REPLIES 15
Ksharp
Super User

Can you use filename + url to get the xml file ? If it does , that would easy to get .

data want;

input;

length symbol expiration $ 20;

retain symbol;

if _infile_ eq: '<option symbol="' then symbol=scan(_infile_,2,'"');

  else  if _infile_ eq: '<contract>' then do;expiration=scan(_infile_,2,'<>');output;end;

datalines4;

<query xmlns:yahoo="http://www.yahooapis.com/v1/base.rng" yahoo:count="2" yahoo:created="2014-05-04T12:23:36Z" yahoo:lang="en-US">

<diagnostics>

<url execution-start-time="1" execution-stop-time="2" execution-time="1">

<![CDATA[ http://datatables.org/alltables.env ]]>

</url>

<url execution-start-time="2" execution-stop-time="3" execution-time="1">

<![CDATA[

http://www.datatables.org/yahoo/finance/yahoo.finance.option_contracts.xml

]]>

</url>

<publiclyCallable>true</publiclyCallable>

<url execution-start-time="6" execution-stop-time="760" execution-time="754">

<![CDATA[ http://finance.yahoo.com/q/op?s=yhoo ]]>

</url>

<javascript execution-start-time="4" execution-stop-time="777" execution-time="772" instructions-used="14243" table-name="yahoo.finance.option_contracts"/>

<url execution-start-time="6" execution-stop-time="843" execution-time="837">

<![CDATA[ http://finance.yahoo.com/q/op?s=fb ]]>

</url>

<javascript execution-start-time="4" execution-stop-time="861" execution-time="856" instructions-used="25640" table-name="yahoo.finance.option_contracts"/>

<user-time>862</user-time>

<service-time>1472</service-time>

<build-version>0.2.2467</build-version>

</diagnostics>

<results>

<option symbol="yhoo">

<contract>2014-05</contract>

<contract>2014-06</contract>

<contract>2014-07</contract>

<contract>2014-10</contract>

<contract>2015-01</contract>

<contract>2016-01</contract>

</option>

<option symbol="fb">

<contract>2014-05</contract>

<contract>2014-06</contract>

<contract>2014-07</contract>

<contract>2014-09</contract>

<contract>2014-12</contract>

<contract>2015-01</contract>

<contract>2016-01</contract>

</option>

</results>

</query>

<!--  total: 862  -->

<!--  engine8.yql.gq1.yahoo.com  -->

;;;;

run;

Xia Keshan

BillJones
Calcite | Level 5

Ksharp,

Thanks so much for the suggestion.  I'll give it a try and will report back.

-Bill

BillJones
Calcite | Level 5

Ksharp,

Thanks again for the code.  Your solution worked perfectly when I used the datalines4 syntax.  However, I'm having some trouble using the filename + url  solution.  I can put the data into the log, but can't seem to get sas to read it.  Note I tried using the put statement to create the data for datalines4 but it didn't work.

Here's the code that I currently have

filename in url "https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20yahoo.finance.option_contracts%20WHE..." lrecl=32000; quit;

data want;

infile in truncover;

input;

put _infile_;

length symbol expiration $ 20;

retain symbol;

if _infile_ eq: '<option symbol="' then symbol=scan(_infile_,2,'"');

  else  if _infile_ eq: '<contract>' then do;expiration=scan(_infile_,2,'<>');output;end;

run;

Does anyone have any ideas on what I'm missing?  Perhaps an infile=option?

Any suggestions would be greatly appreciated.

Thank you,

Bill

Ksharp
Super User

Why add "quit;"  at the end of filename statement ?

if you can put it into LOG ,that said you can read it .or you could try :

if find( _infile_ , '<option symbol="' )  then symbol=scan(_infile_,2,'"');

  else  if  find( _infile_ , '<contract>' )  then do;expiration=scan(_infile_,2,'<>');output;end;

BillJones
Calcite | Level 5

Thanks for all the suggestions.  I believe that I should be able to read the data without using the XML mapper.  I'm able to see the data in the log, but SAS cannot interpret it.  I'm getting an invalid data error (code & log are below).  Does anyone see what I'm doing wrong?  Any thoughts or suggestions would be greatly appreciated.

-Bill

Code:

dm 'clear log';

filename in url "https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20yahoo.finance.option_contracts%20WHE..." lrecl=32000;

data want;

infile in dsd end=eof termstr=crlf;

input symbol expiration $ 20;

retain symbol;

if find(_infile_ ,'<option symbol="') then symbol=scan(_infile_,2,'"');

  else  if  find( _infile_ , '<contract>' )  then do;expiration=scan(_infile_,2,'<>');output; end;

run;

Log:

102  dm 'clear log';

103

104  filename in url

104! "https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20yahoo.finance.option_contr

104! acts%20WHERE%20symbol%20in('yhoo'%2C'fb')%0A%09%09&diagnostics=true&env=http%3A%2F%2Fdatata

104! bles.org%2Falltables.env" lrecl=32000;

WARNING: Apparent symbolic reference DIAGNOSTICS not resolved.

WARNING: Apparent symbolic reference DIAGNOSTICS not resolved.

WARNING: Apparent symbolic reference ENV not resolved.

105

106  data want;

107  infile in dsd end=eof termstr=crlf;

108  input symbol expiration $ 20;

109  retain symbol;

110  if find(_infile_ ,'<option symbol="') then symbol=scan(_infile_,2,'"');

111    else  if  find( _infile_ , '<contract>' )  then do;expiration=scan(_infile_,2,'<>');

111! output; end;

112  run;

NOTE: Character values have been converted to numeric values at the places given by:

      (Line):(Column).

      110:51

NOTE: The infile IN is:

      Filename=https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20yahoo.finance.opt

      ion_contracts%20WHERE%20symbol%20in('yhoo'%2C'fb')%0A%09%09&diagnostics=true&env=http%3A%2

      F%2Fdatatables.org%2Falltables.env,

      Local Host Name=LP43235,

      Local Host IP addr=fe80::5d94:d77b:dc9d:7b8f%10,

      Service Hostname Name=yts1.yql.vip.ne1.yahoo.com,

      Service IP addr=98.138.243.55,Service Name=N/A,

      Service Portno=443,Lrecl=32000,Recfm=Variable

NOTE: Invalid data for symbol in line 1 1-1747.

RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+-

RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+-

1   CHAR  <?xml version="1.0" encoding="UTF-8"?>.<query xmlns:yahoo="http://www.yahooapis.com/v1

    ZONE  33766276776663232322666666663255423233037767727666737666632677732277727666667672666273

    NUMR  CF8DC065239FED21E0205E3F49E7D2546D82FEAC1552908DCE3A918FFD28440AFF777E918FF1093E3FDF61

      87  /base.rng" yahoo:count="2" yahoo:created="2014-05-11T10:56:44Z" yahoo:lang="en"><diagn

     173  ostics><url execution-start-time="0" execution-stop-time="71" execution-time="71"><![C

     259  DATA[http://datatables.org/alltables.env]]></url><url execution-start-time="72" execut

     345  ion-stop-time="143" execution-time="71"><![CDATA[http://www.datatables.org/yahoo/finan

     431  ce/yahoo.finance.option_contracts.xml]]></url><publiclyCallable>true</publiclyCallable

     517  ><url execution-start-time="146" execution-stop-time="1143" execution-time="997"><![CD

     603  ATA[http://finance.yahoo.com/q/op?s=fb]]></url><javascript execution-start-time="145"

     689  execution-stop-time="1164" execution-time="1019" instructions-used="15397" table-name=

     775  "yahoo.finance.option_contracts"></javascript><url execution-start-time="146" executio

     861  n-stop-time="1221" execution-time="1075"><![CDATA[http://finance.yahoo.com/q/op?s=yhoo

     947  ]]></url><javascript execution-start-time="145" execution-stop-time="1243" execution-t

    1033  ime="1098" instructions-used="25640" table-name="yahoo.finance.option_contracts"></jav

    1119  ascript><user-time>1244</user-time><service-time>2117</service-time><build-version>0.2

    1205  .2467</build-version></diagnostics><results><option symbol="yhoo"><contract>2014-05</c

    1291  ontract><contract>2014-06</contract><contract>2014-07</contract><contract>2014-10</con

    1377  tract><contract>2015-01</contract><contract>2016-01</contract></option><option symbol=

    1463  "fb"><contract>2014-05</contract><contract>2014-06</contract><contract>2014-07</contra

    1549  ct><contract>2014-09</contract><contract>2014-12</contract><contract>2015-01</contract

    1635  ><contract>2016-01</contract></option></results></query><!-- total: 1244 -->.<!-- engi

    ZONE  33666776673333323332666776673326776663327677677332776773322227676632333322230322226666

    NUMR  EC3FE42134E2016D01CF3FE42134ECFF049FEECF2535C43ECF15529EC1DD04F41CA012440DDEAC1DD05E79

    1721  ne10.yql.ne1.yahoo.com -->. 1747

    ZONE  663327762663276666266622230

    NUMR  E510E91CEE51E918FFE3FD0DDEA

eof=1 symbol=1 expiration=  _ERROR_=1

_INFILE_=<?xml version="1.0" encoding="UTF-8"?>

<query xmlns:yahoo="http://www.yahooapis.com/v1/

base.rng" yahoo:count="2" yahoo:created="2014-05-11T10:56:44Z" yahoo:lang="en"><diagnostics><url

execution-start-time="0" execution-stop-time="71" execution-time="71"><![CDATA[http://datatable

s.org/alltables.env]]></url><url execution-start-time="72" execution-stop-time="143" execution-t

ime="71"><![CDATA[http://www.datatables.org/yahoo/finance/yahoo.finance.option_contracts.xml]]><

/url><publiclyCallable>true</publiclyCallable><url execution-start-time="146" execution-stop-tim

e="1143" execution-time="997"><![CDATA[http://finance.yahoo.com/q/op?s=fb]]></url><javascript ex

ecution-start-time="145" execution-stop-time="1164" execution-time="1019" instructions-used="153

97" table-name="yahoo.finance.option_contracts"></javascript><url execution-start-time="146" exe

cution-stop-time="1221" execution-time="1075"><![CDATA[http://finance.yahoo.com/q/op?s=yhoo]]></

url><javascript execution-start-time="145" execution-stop-time="1243" execution-time="1098" inst

ructions-used="25640" table-name="yahoo.finance.option_contracts"></javascript><user-time>1244</

user-time><service-time>2117</service-time><build-version>0.2.2467</build-version></diagnostics>

<results><option symbol="yhoo"><contract>2014-05</contract><contract>2014-06</contract><contract

>2014-07</contract><contract>2014-10</contract><contract>2015-01</contract><contract>2016-01</co

ntract></option><option symbol="fb"><contract>2014-05</contract><contract>2014-06</contract><con

tract>2014-07</contract><contract>2014-09</contract><contract>2014-12</contract><contract>2015-0

1</contract><contract>2016-01</contract></option></results></query><!-- total: 1244 -->

<!-- eng

ine10.yql.ne1.yahoo.com -->

_N_=1

NOTE: 1 record was read from the infile IN.

      The minimum record length was 1747.

      The maximum record length was 1747.

NOTE: The data set WORK.WANT has 0 observations and 2 variables.

NOTE: DATA statement used (Total process time):

      real time           1.86 seconds

      cpu time            0.15 seconds

Ksharp
Super User

It looks like this xml is a stream file .Try this one.

 
data temp;
infile 'c:\temp\yahoo.txt' recfm=n dlm='<>';
length symbol expiration $ 20;
retain symbol;
input token : $200. @@;
if prxmatch('/(^\d{4}-\d{2})|(option\s+symbol=)/o',token) then do;
  if left(token) =: 'option' then  do;symbol=scan(token,2,'"'); delete;end;
  expiration=token;
  output;
 end;
 drop token;
run;

filename x url 'http://...........' ;
data temp;
infile x recfm=n dlm='<>';
length symbol expiration $ 20;
retain symbol;
input token : $200. @@;
if prxmatch('/(^\d{4}-\d{2})|(option\s+symbol=)/o',token) then do;
  if left(token) =: 'option' then  do;symbol=scan(token,2,'"'); delete;end;
  expiration=token;
  output;
 end;
 drop token;
run;

Xia Keshan

Message was edited by: xia keshan

BillJones
Calcite | Level 5

Ksharp,

Thanks so much for the code.  Your solution worked perfectly.  I can now download all the option expiration dates.  Smiley Happy

Thanks again!

Best regards,

Bill

jakarman
Barite | Level 11

You specified cr lf.   cr=Carriage Return lf=linefeed remember the old typewriters having this physical. This convention is Windows related.

Within internet http/xml the accepted convention is Unix-based the newline or lf.  It has no meaning in html or xml. It is just being made human readable and workable within editors.

Put the Zone/Numr lines in a monospace font (courier) in a editor. The 0 and 1 in the Zone field should alarm you. Having it aligned you can see the combination 0A this the lf-symbol. lf is not a valid symbol for string processing as it has no image for the screen.  .... Advice: change your record indication form crlf to lf.

You are seeing the xml is allowing UTf8 this encoding is mentioned. Utf8 is a possible multitbyte character encoding. With the common latin1 symbols the old 1 byte character is used.

When there would be a multibyte char it will switch by using 00 (zone/number) and other not usual used bytes.

The warnings in the url are coming from the & usage. Having the string in " (double quotes) SAS will try to solve that is sas-macro vars.

The %20 (space) %3A (:)  in your Url are byte encodings to protect misunderstanding in intrepretations (routers/webservers). The %0A (lf) and %09 (ht tab) usage there I do not understand   

---->-- ja karman --<-----
BillJones
Calcite | Level 5

Jaap,

Thanks for your thoughts.  I tried some of your suggestions,

-Bill

jakarman
Barite | Level 11

I have just entere that long url link in a browser. What I see is coming back is tree records, the record you need is a long string with all the data you have previous copied  as separated records. FF and Chrome are giving a nice message like the XML mapper, the layout recognition is fine.  Could be an indication you could process the file as XML.

As long as you use that file and save it as a real copy not doing a copy/paste from the screen you are on the best options.

For some XML backgropund see: http://support.sas.com/resources/papers/proceedings12/220-2012.pdf

---->-- ja karman --<-----
jakarman
Barite | Level 11

seeing you posted data is does not follow the XML Document Types (w3schools) structure.
There is a reference to a yahoo xml file that specify the in a xml this content.

Would be an explanation why XMLapper does not recognize the data you still can see. 

---->-- ja karman --<-----
BillJones
Calcite | Level 5

Jaap & Cynthia,

Thanks for your thoughts.  No wonder I cannot import the data with the XML Mapper!

Best,

Bill

Cynthia_sas
SAS Super FREQ


Hi:

  What you have looks like an XQUERY file, not really an XML data file. The XML Mapper is designed to work with XML data files. You might want to work with Tech Support on whether this type of file will even be possible to read with the XML Mapper utility.

cynthia

Vince28_Statcan
Quartz | Level 8

I can't seem to open the XML links but given the construct in your post, you should be able to read that with the mapper in 2 different ways. The simplest being to make sure that your

<TABLE-PATH>

has tue full hierarchy all the way up to and including /contract

</TABLE-PATH>

This acts as your row-creating driver. That is, doing so you will have one row created in your dataset each time you run through this hierarchy.

The second thing to do is to define the column SYMBOL using the options retain and replace like

<COLUMN name="SYMBOL" retain="yes" replace="yes">

...

</COLUMN>

These 2 things combined should allow you to easily create the desired table from the specified xml. If you can specify the full x-path to /contract (since I can't seem to view the xml link), I can probably provide you with a .map file that will meet your requirements.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 15 replies
  • 3630 views
  • 4 likes
  • 5 in conversation