BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
HabAM
Quartz | Level 8

How do you save the titles on the 'US - Based Outbreak' container using SAS?

https://www.cdc.gov/outbreaks/index.html

 

 

 

 

 

(EDIT via Reeza to fix the link)

HabAM
1 ACCEPTED SOLUTION

Accepted Solutions
ChrisHemedinger
Community Manager

I noticed that the CDC offers a lot of RSS feeds -- XML representations of the data on their site.

 

Using SAS, you can use PROC HTTP to fetch the XML, and then the XMLV2 libname engine to read that information as data.

 

You'll have to find the proper RSS feed for your needs.  They have many of them listed here.  Here's a working example with one of their feeds.

 

filename rssmap temp;
data _null_;
infile datalines;
file rssmap;
input;
put _infile_;
datalines;
<?xml version="1.0" encoding="windows-1252"?>
<SXLEMAP name="RSSMAP" version="2.1">
    <NAMESPACES count="0"/>
    <!-- ############################################################ -->
    <TABLE name="item">
        <TABLE-PATH syntax="XPath">/rss/channel/item</TABLE-PATH>
        <COLUMN name="title">
            <PATH syntax="XPath">/rss/channel/item/title</PATH>
            <TYPE>character</TYPE>
            <DATATYPE>string</DATATYPE>
            <LENGTH>250</LENGTH>
        </COLUMN>
        <COLUMN name="link">
            <PATH syntax="XPath">/rss/channel/item/link</PATH>
            <TYPE>character</TYPE>
            <DATATYPE>string</DATATYPE>
            <LENGTH>200</LENGTH>
        </COLUMN>
        <COLUMN name="pubDate">
            <PATH syntax="XPath">/rss/channel/item/pubDate</PATH>
            <TYPE>character</TYPE>
            <DATATYPE>string</DATATYPE>
            <LENGTH>40</LENGTH>
        </COLUMN>
    </TABLE>
</SXLEMAP>
;
run;


filename feed temp;
proc http
 method="get"
 url="https://www2c.cdc.gov/podcasts/createrss.asp?t=r&c=429."
 out=feed;
run;

libname result XMLv2 xmlfileref=feed xmlmap=rssmap;

data bulletins;
 set result.item;
 length date 8; 
 format date datetime20.;
 date = input( substr(pubDate,4),anydtdtm.);
 drop pubDate;
run;

 

Result:

 

rssfeed.png

It's time to register for SAS Innovate! Join your SAS user peers in Las Vegas on April 16-19 2024.

View solution in original post

10 REPLIES 10
ChrisBrooks
Ammonite | Level 13

I get "Page not Found" when I click on your link but in any case SAS isn't a Web Scraping Tool - I'd use something like the Python library Beautiful Soup for that

HabAM
Quartz | Level 8

I don't know why you are not getting the link to open, i checked it again and it works. The reason I wanted to try in in SAS is I wanted to integrate it with my existing SAS reports. Thank you for the response.

HabAM
ChrisHemedinger
Community Manager

I think @Reeza edited the post and fixed the link for you -- that's why it works now. 

 

And I found the RSS feed you need for that category:

 

 


filename feed temp;
proc http
method="get"
url="https://tools.cdc.gov/api/v2/resources/media/285676.rss"
out=feed;
run;

 

feed2.png

 

It's time to register for SAS Innovate! Join your SAS user peers in Las Vegas on April 16-19 2024.
HabAM
Quartz | Level 8

That explains it. Thank you all.

HabAM
ChrisHemedinger
Community Manager

I noticed that the CDC offers a lot of RSS feeds -- XML representations of the data on their site.

 

Using SAS, you can use PROC HTTP to fetch the XML, and then the XMLV2 libname engine to read that information as data.

 

You'll have to find the proper RSS feed for your needs.  They have many of them listed here.  Here's a working example with one of their feeds.

 

filename rssmap temp;
data _null_;
infile datalines;
file rssmap;
input;
put _infile_;
datalines;
<?xml version="1.0" encoding="windows-1252"?>
<SXLEMAP name="RSSMAP" version="2.1">
    <NAMESPACES count="0"/>
    <!-- ############################################################ -->
    <TABLE name="item">
        <TABLE-PATH syntax="XPath">/rss/channel/item</TABLE-PATH>
        <COLUMN name="title">
            <PATH syntax="XPath">/rss/channel/item/title</PATH>
            <TYPE>character</TYPE>
            <DATATYPE>string</DATATYPE>
            <LENGTH>250</LENGTH>
        </COLUMN>
        <COLUMN name="link">
            <PATH syntax="XPath">/rss/channel/item/link</PATH>
            <TYPE>character</TYPE>
            <DATATYPE>string</DATATYPE>
            <LENGTH>200</LENGTH>
        </COLUMN>
        <COLUMN name="pubDate">
            <PATH syntax="XPath">/rss/channel/item/pubDate</PATH>
            <TYPE>character</TYPE>
            <DATATYPE>string</DATATYPE>
            <LENGTH>40</LENGTH>
        </COLUMN>
    </TABLE>
</SXLEMAP>
;
run;


filename feed temp;
proc http
 method="get"
 url="https://www2c.cdc.gov/podcasts/createrss.asp?t=r&c=429."
 out=feed;
run;

libname result XMLv2 xmlfileref=feed xmlmap=rssmap;

data bulletins;
 set result.item;
 length date 8; 
 format date datetime20.;
 date = input( substr(pubDate,4),anydtdtm.);
 drop pubDate;
run;

 

Result:

 

rssfeed.png

It's time to register for SAS Innovate! Join your SAS user peers in Las Vegas on April 16-19 2024.
HabAM
Quartz | Level 8

Thank you. Is there a document published you recommend for me to review?

HabAM
SAS_inquisitive
Lapis Lazuli | Level 10

There is r package called 'rvest' developed by Hadley if you are comfortable with R.

Pranjal
Calcite | Level 5

filename rssmap temp;
data _null_;
infile datalines;
file rssmap;
input;
put _infile_;
datalines;
<?xml version="1.0" encoding="windows-1252"?>
<SXLEMAP name="RSSMAP" version="2.1">
    <NAMESPACES count="0"/>
    <!-- ############################################################ -->
    <TABLE name="item">
        <TABLE-PATH syntax="XPath">/rss/channel/item</TABLE-PATH>
        <COLUMN name="title">
            <PATH syntax="XPath">/rss/channel/item/title</PATH>
            <TYPE>character</TYPE>
            <DATATYPE>string</DATATYPE>
            <LENGTH>250</LENGTH>
        </COLUMN>
        <COLUMN name="link">
            <PATH syntax="XPath">/rss/channel/item/link</PATH>
            <TYPE>character</TYPE>
            <DATATYPE>string</DATATYPE>
            <LENGTH>200</LENGTH>
        </COLUMN>
        <COLUMN name="pubDate">
            <PATH syntax="XPath">/rss/channel/item/pubDate</PATH>
            <TYPE>character</TYPE>
            <DATATYPE>string</DATATYPE>
            <LENGTH>40</LENGTH>
        </COLUMN>
    </TABLE>
</SXLEMAP>
;
run;


filename feed temp;
proc http
 method="get"
 url="https://nps.magicbricks.com/npsScript/nps.js?1.337"
 out=feed;
run;

libname result XMLv2 xmlfileref=feed xmlmap=rssmap;

data bulletins;
 set result.item;
 length date 8;
 format date datetime20.;
 date = input( substr(pubDate,4),anydtdtm.);
 drop pubDate;
run;

 

Error in xml

ChrisHemedinger
Community Manager

@Pranjal - what are you trying to get from this "page"? The URL you supplied is a javascript file, not XML.  Please post the details of what you need in a different question, rather than add to this solved topic.

It's time to register for SAS Innovate! Join your SAS user peers in Las Vegas on April 16-19 2024.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 2897 views
  • 5 likes
  • 5 in conversation