Hello,
I'm trying to import a seemingly simple XML file into SAS with the XML mapper utility. Note I used the XML mapper successfully on what I thought was a much more complex xml file without any issue. Perhaps I'm missing some simple tweak that will import the data. I want import a table with two variables: stock symbol and option expiration date. If there's more than one expiration date, then the symbol should be repeated.
Here's the xml file that I want to import:
This XML file does not appear to have any style information associated with it. The document tree is shown below.
</url>
</url>
<publiclyCallable>true</publiclyCallable>
</url>
<javascript execution-start-time="4" execution-stop-time="845" execution-time="840" instructions-used="14243" table-name="yahoo.finance.option_contracts"/>
</url>
<javascript execution-start-time="4" execution-stop-time="958" execution-time="954" instructions-used="25640" table-name="yahoo.finance.option_contracts"/>
<user-time>959</user-time>
<service-time>1640</service-time>
<build-version>0.2.2467</build-version>
</diagnostics>
<contract>2014-06</contract>
<contract>2014-07</contract>
<contract>2014-10</contract>
<contract>2015-01</contract>
<contract>2016-01</contract>
</option>
<contract>2014-06</contract>
<contract>2014-07</contract>
<contract>2014-09</contract>
<contract>2014-12</contract>
<contract>2015-01</contract>
<contract>2016-01</contract>
</option>
</results>
</query>
<!-- total: 959 -->
<!-- engine5.yql.gq1.yahoo.com -->
url to the above xml: https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20yahoo.finance.option_contracts%20WHE...
So, given the above I hope to get a table that looks like:
Symbol Expiration
yhoo 2014-05
yhoo 2014-06
yhoo 2014-07
yhoo 2014-10
yhoo 2015-01
yhoo 2016-01
fb 2014-05
fb 2014-06
fb 2014-07
fb 2014-09
fb 2014-12
fb 2015-01
fb 2016-01
When I try to import the data using the XML mapper, I get an error stating that the " XMLMap contains no tables". And when I try to create a custom map I only get the last record of the desired table above. Is there a simple adjustment I can make to import the data? Thank you so much in advance for any help or suggestions.
-Bill
It looks like this xml is a stream file .Try this one.
data temp; infile 'c:\temp\yahoo.txt' recfm=n dlm='<>'; length symbol expiration $ 20; retain symbol; input token : $200. @@; if prxmatch('/(^\d{4}-\d{2})|(option\s+symbol=)/o',token) then do; if left(token) =: 'option' then do;symbol=scan(token,2,'"'); delete;end; expiration=token; output; end; drop token; run;
filename x url 'http://...........' ; data temp; infile x recfm=n dlm='<>'; length symbol expiration $ 20; retain symbol; input token : $200. @@; if prxmatch('/(^\d{4}-\d{2})|(option\s+symbol=)/o',token) then do; if left(token) =: 'option' then do;symbol=scan(token,2,'"'); delete;end; expiration=token; output; end; drop token; run;
Xia Keshan
Message was edited by: xia keshan
Can you use filename + url to get the xml file ? If it does , that would easy to get .
data want;
input;
length symbol expiration $ 20;
retain symbol;
if _infile_ eq: '<option symbol="' then symbol=scan(_infile_,2,'"');
else if _infile_ eq: '<contract>' then do;expiration=scan(_infile_,2,'<>');output;end;
datalines4;
<query xmlns:yahoo="http://www.yahooapis.com/v1/base.rng" yahoo:count="2" yahoo:created="2014-05-04T12:23:36Z" yahoo:lang="en-US">
<diagnostics>
<url execution-start-time="1" execution-stop-time="2" execution-time="1">
<![CDATA[ http://datatables.org/alltables.env ]]>
</url>
<url execution-start-time="2" execution-stop-time="3" execution-time="1">
<![CDATA[
http://www.datatables.org/yahoo/finance/yahoo.finance.option_contracts.xml
]]>
</url>
<publiclyCallable>true</publiclyCallable>
<url execution-start-time="6" execution-stop-time="760" execution-time="754">
<![CDATA[ http://finance.yahoo.com/q/op?s=yhoo ]]>
</url>
<javascript execution-start-time="4" execution-stop-time="777" execution-time="772" instructions-used="14243" table-name="yahoo.finance.option_contracts"/>
<url execution-start-time="6" execution-stop-time="843" execution-time="837">
<![CDATA[ http://finance.yahoo.com/q/op?s=fb ]]>
</url>
<javascript execution-start-time="4" execution-stop-time="861" execution-time="856" instructions-used="25640" table-name="yahoo.finance.option_contracts"/>
<user-time>862</user-time>
<service-time>1472</service-time>
<build-version>0.2.2467</build-version>
</diagnostics>
<results>
<option symbol="yhoo">
<contract>2014-05</contract>
<contract>2014-06</contract>
<contract>2014-07</contract>
<contract>2014-10</contract>
<contract>2015-01</contract>
<contract>2016-01</contract>
</option>
<option symbol="fb">
<contract>2014-05</contract>
<contract>2014-06</contract>
<contract>2014-07</contract>
<contract>2014-09</contract>
<contract>2014-12</contract>
<contract>2015-01</contract>
<contract>2016-01</contract>
</option>
</results>
</query>
<!-- total: 862 -->
<!-- engine8.yql.gq1.yahoo.com -->
;;;;
run;
Xia Keshan
Ksharp,
Thanks again for the code. Your solution worked perfectly when I used the datalines4 syntax. However, I'm having some trouble using the filename + url solution. I can put the data into the log, but can't seem to get sas to read it. Note I tried using the put statement to create the data for datalines4 but it didn't work.
Here's the code that I currently have
filename in url "https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20yahoo.finance.option_contracts%20WHE..." lrecl=32000; quit;
data want;
infile in truncover;
input;
put _infile_;
length symbol expiration $ 20;
retain symbol;
if _infile_ eq: '<option symbol="' then symbol=scan(_infile_,2,'"');
else if _infile_ eq: '<contract>' then do;expiration=scan(_infile_,2,'<>');output;end;
run;
Does anyone have any ideas on what I'm missing? Perhaps an infile=option?
Any suggestions would be greatly appreciated.
Thank you,
Bill
Why add "quit;" at the end of filename statement ?
if you can put it into LOG ,that said you can read it .or you could try :
if find( _infile_ , '<option symbol="' ) then symbol=scan(_infile_,2,'"');
else if find( _infile_ , '<contract>' ) then do;expiration=scan(_infile_,2,'<>');output;end;
Thanks for all the suggestions. I believe that I should be able to read the data without using the XML mapper. I'm able to see the data in the log, but SAS cannot interpret it. I'm getting an invalid data error (code & log are below). Does anyone see what I'm doing wrong? Any thoughts or suggestions would be greatly appreciated.
-Bill
Code:
dm 'clear log';
filename in url "https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20yahoo.finance.option_contracts%20WHE..." lrecl=32000;
data want;
infile in dsd end=eof termstr=crlf;
input symbol expiration $ 20;
retain symbol;
if find(_infile_ ,'<option symbol="') then symbol=scan(_infile_,2,'"');
else if find( _infile_ , '<contract>' ) then do;expiration=scan(_infile_,2,'<>');output; end;
run;
Log:
102 dm 'clear log';
103
104 filename in url
104! "https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20yahoo.finance.option_contr
104! acts%20WHERE%20symbol%20in('yhoo'%2C'fb')%0A%09%09&diagnostics=true&env=http%3A%2F%2Fdatata
104! bles.org%2Falltables.env" lrecl=32000;
WARNING: Apparent symbolic reference DIAGNOSTICS not resolved.
WARNING: Apparent symbolic reference DIAGNOSTICS not resolved.
WARNING: Apparent symbolic reference ENV not resolved.
105
106 data want;
107 infile in dsd end=eof termstr=crlf;
108 input symbol expiration $ 20;
109 retain symbol;
110 if find(_infile_ ,'<option symbol="') then symbol=scan(_infile_,2,'"');
111 else if find( _infile_ , '<contract>' ) then do;expiration=scan(_infile_,2,'<>');
111! output; end;
112 run;
NOTE: Character values have been converted to numeric values at the places given by:
(Line):(Column).
110:51
NOTE: The infile IN is:
Filename=https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20yahoo.finance.opt
ion_contracts%20WHERE%20symbol%20in('yhoo'%2C'fb')%0A%09%09&diagnostics=true&env=http%3A%2
F%2Fdatatables.org%2Falltables.env,
Local Host Name=LP43235,
Local Host IP addr=fe80::5d94:d77b:dc9d:7b8f%10,
Service Hostname Name=yts1.yql.vip.ne1.yahoo.com,
Service IP addr=98.138.243.55,Service Name=N/A,
Service Portno=443,Lrecl=32000,Recfm=Variable
NOTE: Invalid data for symbol in line 1 1-1747.
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+-
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+-
1 CHAR <?xml version="1.0" encoding="UTF-8"?>.<query xmlns:yahoo="http://www.yahooapis.com/v1
ZONE 33766276776663232322666666663255423233037767727666737666632677732277727666667672666273
NUMR CF8DC065239FED21E0205E3F49E7D2546D82FEAC1552908DCE3A918FFD28440AFF777E918FF1093E3FDF61
87 /base.rng" yahoo:count="2" yahoo:created="2014-05-11T10:56:44Z" yahoo:lang="en"><diagn
173 ostics><url execution-start-time="0" execution-stop-time="71" execution-time="71"><![C
259 DATA[http://datatables.org/alltables.env]]></url><url execution-start-time="72" execut
345 ion-stop-time="143" execution-time="71"><![CDATA[http://www.datatables.org/yahoo/finan
431 ce/yahoo.finance.option_contracts.xml]]></url><publiclyCallable>true</publiclyCallable
517 ><url execution-start-time="146" execution-stop-time="1143" execution-time="997"><![CD
603 ATA[http://finance.yahoo.com/q/op?s=fb]]></url><javascript execution-start-time="145"
689 execution-stop-time="1164" execution-time="1019" instructions-used="15397" table-name=
775 "yahoo.finance.option_contracts"></javascript><url execution-start-time="146" executio
861 n-stop-time="1221" execution-time="1075"><![CDATA[http://finance.yahoo.com/q/op?s=yhoo
947 ]]></url><javascript execution-start-time="145" execution-stop-time="1243" execution-t
1033 ime="1098" instructions-used="25640" table-name="yahoo.finance.option_contracts"></jav
1119 ascript><user-time>1244</user-time><service-time>2117</service-time><build-version>0.2
1205 .2467</build-version></diagnostics><results><option symbol="yhoo"><contract>2014-05</c
1291 ontract><contract>2014-06</contract><contract>2014-07</contract><contract>2014-10</con
1377 tract><contract>2015-01</contract><contract>2016-01</contract></option><option symbol=
1463 "fb"><contract>2014-05</contract><contract>2014-06</contract><contract>2014-07</contra
1549 ct><contract>2014-09</contract><contract>2014-12</contract><contract>2015-01</contract
1635 ><contract>2016-01</contract></option></results></query><!-- total: 1244 -->.<!-- engi
ZONE 33666776673333323332666776673326776663327677677332776773322227676632333322230322226666
NUMR EC3FE42134E2016D01CF3FE42134ECFF049FEECF2535C43ECF15529EC1DD04F41CA012440DDEAC1DD05E79
1721 ne10.yql.ne1.yahoo.com -->. 1747
ZONE 663327762663276666266622230
NUMR E510E91CEE51E918FFE3FD0DDEA
eof=1 symbol=1 expiration= _ERROR_=1
_INFILE_=<?xml version="1.0" encoding="UTF-8"?>
<query xmlns:yahoo="http://www.yahooapis.com/v1/
base.rng" yahoo:count="2" yahoo:created="2014-05-11T10:56:44Z" yahoo:lang="en"><diagnostics><url
execution-start-time="0" execution-stop-time="71" execution-time="71"><![CDATA[http://datatable
s.org/alltables.env]]></url><url execution-start-time="72" execution-stop-time="143" execution-t
ime="71"><![CDATA[http://www.datatables.org/yahoo/finance/yahoo.finance.option_contracts.xml]]><
/url><publiclyCallable>true</publiclyCallable><url execution-start-time="146" execution-stop-tim
e="1143" execution-time="997"><![CDATA[http://finance.yahoo.com/q/op?s=fb]]></url><javascript ex
ecution-start-time="145" execution-stop-time="1164" execution-time="1019" instructions-used="153
97" table-name="yahoo.finance.option_contracts"></javascript><url execution-start-time="146" exe
cution-stop-time="1221" execution-time="1075"><![CDATA[http://finance.yahoo.com/q/op?s=yhoo]]></
url><javascript execution-start-time="145" execution-stop-time="1243" execution-time="1098" inst
ructions-used="25640" table-name="yahoo.finance.option_contracts"></javascript><user-time>1244</
user-time><service-time>2117</service-time><build-version>0.2.2467</build-version></diagnostics>
<results><option symbol="yhoo"><contract>2014-05</contract><contract>2014-06</contract><contract
>2014-07</contract><contract>2014-10</contract><contract>2015-01</contract><contract>2016-01</co
ntract></option><option symbol="fb"><contract>2014-05</contract><contract>2014-06</contract><con
tract>2014-07</contract><contract>2014-09</contract><contract>2014-12</contract><contract>2015-0
1</contract><contract>2016-01</contract></option></results></query><!-- total: 1244 -->
<!-- eng
ine10.yql.ne1.yahoo.com -->
_N_=1
NOTE: 1 record was read from the infile IN.
The minimum record length was 1747.
The maximum record length was 1747.
NOTE: The data set WORK.WANT has 0 observations and 2 variables.
NOTE: DATA statement used (Total process time):
real time 1.86 seconds
cpu time 0.15 seconds
It looks like this xml is a stream file .Try this one.
data temp; infile 'c:\temp\yahoo.txt' recfm=n dlm='<>'; length symbol expiration $ 20; retain symbol; input token : $200. @@; if prxmatch('/(^\d{4}-\d{2})|(option\s+symbol=)/o',token) then do; if left(token) =: 'option' then do;symbol=scan(token,2,'"'); delete;end; expiration=token; output; end; drop token; run;
filename x url 'http://...........' ; data temp; infile x recfm=n dlm='<>'; length symbol expiration $ 20; retain symbol; input token : $200. @@; if prxmatch('/(^\d{4}-\d{2})|(option\s+symbol=)/o',token) then do; if left(token) =: 'option' then do;symbol=scan(token,2,'"'); delete;end; expiration=token; output; end; drop token; run;
Xia Keshan
Message was edited by: xia keshan
Ksharp,
Thanks so much for the code. Your solution worked perfectly. I can now download all the option expiration dates.
Thanks again!
Best regards,
Bill
You specified cr lf. cr=Carriage Return lf=linefeed remember the old typewriters having this physical. This convention is Windows related.
Within internet http/xml the accepted convention is Unix-based the newline or lf. It has no meaning in html or xml. It is just being made human readable and workable within editors.
Put the Zone/Numr lines in a monospace font (courier) in a editor. The 0 and 1 in the Zone field should alarm you. Having it aligned you can see the combination 0A this the lf-symbol. lf is not a valid symbol for string processing as it has no image for the screen. .... Advice: change your record indication form crlf to lf.
You are seeing the xml is allowing UTf8 this encoding is mentioned. Utf8 is a possible multitbyte character encoding. With the common latin1 symbols the old 1 byte character is used.
When there would be a multibyte char it will switch by using 00 (zone/number) and other not usual used bytes.
The warnings in the url are coming from the & usage. Having the string in " (double quotes) SAS will try to solve that is sas-macro vars.
The %20 (space) %3A (:) in your Url are byte encodings to protect misunderstanding in intrepretations (routers/webservers). The %0A (lf) and %09 (ht tab) usage there I do not understand
Jaap,
Thanks for your thoughts. I tried some of your suggestions,
-Bill
I have just entere that long url link in a browser. What I see is coming back is tree records, the record you need is a long string with all the data you have previous copied as separated records. FF and Chrome are giving a nice message like the XML mapper, the layout recognition is fine. Could be an indication you could process the file as XML.
As long as you use that file and save it as a real copy not doing a copy/paste from the screen you are on the best options.
For some XML backgropund see: http://support.sas.com/resources/papers/proceedings12/220-2012.pdf
seeing you posted data is does not follow the XML Document Types (w3schools) structure.
There is a reference to a yahoo xml file that specify the in a xml this content.
Would be an explanation why XMLapper does not recognize the data you still can see.
Jaap & Cynthia,
Thanks for your thoughts. No wonder I cannot import the data with the XML Mapper!
Best,
Bill
Hi:
What you have looks like an XQUERY file, not really an XML data file. The XML Mapper is designed to work with XML data files. You might want to work with Tech Support on whether this type of file will even be possible to read with the XML Mapper utility.
cynthia
I can't seem to open the XML links but given the construct in your post, you should be able to read that with the mapper in 2 different ways. The simplest being to make sure that your
<TABLE-PATH>
has tue full hierarchy all the way up to and including /contract
</TABLE-PATH>
This acts as your row-creating driver. That is, doing so you will have one row created in your dataset each time you run through this hierarchy.
The second thing to do is to define the column SYMBOL using the options retain and replace like
<COLUMN name="SYMBOL" retain="yes" replace="yes">
...
</COLUMN>
These 2 things combined should allow you to easily create the desired table from the specified xml. If you can specify the full x-path to /contract (since I can't seem to view the xml link), I can probably provide you with a .map file that will meet your requirements.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.