- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi everyone ,
I am trying to scrape a website to find links to articles, however when I use proc http to fetch the html , the html code with all the links is getting stored in a single row , I am using find() to find the position of the start of the link , since I am using find() I can only find the first occurrence .
Please suggest a way to find all the links .
Attaching the code and the row below.
filename extract "/Location/source.txt";
proc http
method="GET"
out=extract
url="https://www.gov.za/newsroom";
run;
data work.report;
infile source length=len lrecl=32767;
input line $varying32767. len;
line = strip(line);
if len>0;
find_str = find(line,'<a href="/speeches/');
if find_str gt 0;
if find_str gt 0 then do;
links=compress(scan(substr(line,find_str+9),1,'>'),'"');
full_link=cats('https://www.gov.za',links);
end;
keep full_link;
run;
<div class="page clearfix" id="page"><header id="section-header" class="section section-header"><div id="zone-branding-wrapper" class="zone-wrapper zone-branding-wrapper clearfix"><div id="zone-branding" class="zone zone-branding clearfix container-12"><div class="grid-12 region region-branding" id="region-branding"><div class="region-inner region-branding-inner"><div class="branding-data clearfix"><div class="logo-img"><a href="/" rel="home" title="South African Government"><img src="https://www.gov.za/sites/all/themes/custom/eco_omega/logo.png?v=20200917" alt="South African Government" id="logo" /></a> </div><hgroup class="site-name-slogan"><h2 class="site-name"><a href="/" title="Home">South African Government</a></h2><h3 class="site-domain"><a href="/" rel="home" title="South African Government">www.gov.za</a></h3><h6 class="site-slogan">Let's grow South Africa together</h6></hgroup><div class="flag-img"><a href="/" rel="home" title="South African Government"><img typeof="foaf:Image" src="https://www.gov.za/sites/all/themes/custom/eco_omega/images/flag-south-africa.svg" alt="" /></a> </div></div></div></div></div></div><div id="zone-menu-wrapper" class="zone-wrapper zone-menu-wrapper clearfix"><div id="zone-menu" class="zone zone-menu clearfix container-12"><div class="grid-12 region region-menu" id="region-menu"><div class="region-inner region-menu-inner"><nav class="navigation"><h2 class="element-invisible">Main menu</h2><ul id="main-menu" class="links inline clearfix main-menu"><li class="menu-205 first"><a href="/">Home</a></li><li class="menu-3411"><a href="/about">About</a></li><li class="menu-1950 active-trail active"><a href="/newsroom" title="Speeches and statements" class="active-trail active">Newsroom</a></li><li class="menu-1279"><a href="/services">Services</a></li><li class="menu-1951 last"><a href="/document/latest" title="Government Documents">Documents</a></li></ul> </nav><div class="block block-superfish block-1 block-superfish-1 odd block-without-title" id="block-superfish-1"><div class="block-inner clearfix"><div class="content clearfix"><ul id="superfish-1" class="menu sf-menu sf-main-menu sf-horizontal sf-style-space sf-total-items-5 sf-parent-items-4 sf-single-items-1"><li id="menu-205-1" class="first odd sf-item-1 sf-depth-1 sf-no-children"><a href="/" class="sf-depth-1">Home</a></li><li id="menu-3411-1" class="middle even sf-item-2 sf-depth-1 sf-total-children-2 sf-parent-children-1 sf-single-children-1 menuparent"><a href="/about" class="sf-depth-1 menuparent">About</a><ul class="sf-megamenu"><li class="sf-megamenu-wrapper middle even sf-item-2 sf-depth-1 sf-total-children-2 sf-parent-children-1 sf-single-children-1 menuparent"><ol><li id="menu-1043-1" class="first odd sf-item-1 sf-depth-2 sf-total-children-8 sf-parent-children-0 sf-single-children-8 sf-megamenu-column menuparent"><div class="sf-megamenu-column"><a href="/about-sa" class="sf-depth-2 menuparent">About SA</a><ol><li id="menu-2608-1" class="first odd sf-item-1 sf-depth-3 sf-no-children"><a href="/about-government/contact-directory" title="" class="sf-depth-3">Contact directory</a></li><li id="menu-2237-1" class="middle even sf-item-2 sf-depth-3 sf-no-children"><a href="/faq" title="" class="sf-depth-3">FAQs</a></li><li id="menu-2142-1" class="middle odd sf-item-3 sf-depth-3 sf-no-children"><a href="/about-government/government-system" title="" class="sf-depth-3">Government system</a></li><li id="menu-2635-1" class="middle even sf-item-4 sf-depth-3 sf-no-children"><a href="/about-government/government-vacancies" title="" class="sf-depth-3">Government vacancies</a></li><li id="menu-2236-1" class="middle odd sf-item-5 sf-depth-3 sf-no-children"><a href="/issues/key-issues" title="" class="sf-depth-3">Key issues</a></li><li id="menu-2141-1" class="middle even sf-item-6 sf-depth-3 sf-no-children"><a href="/about-government/government-programmes/projects-and-campaigns" title="" class="sf-depth-3">Government programmes</a></li><li id="menu-1216-1" class="middle odd sf-item-7 sf-depth-3 sf-no-children"><a href="/about-government/leaders" title="" class="sf-depth-3">Government leaders</a></li><li id="menu-2000-1" class="last even sf-item-8 sf-depth-3 sf-no-children"><a href="/about-government/national-orders" title="" class="sf-depth-3">National Orders</a></li></ol></div></li><li id="menu-1964-1" class="last even sf-item-2 sf-depth-2 sf-no-children"><a href="/links" class="sf-depth-2">Links</a></li></ol></li></ul></li><li id="menu-1950-1" class="active-trail middle odd sf-item-3 sf-depth-1 sf-total-children-9 sf-parent-children-0 sf-single-children-9 menuparent"><a href="/newsroom" title="Speeches and statements" class="sf-depth-1 menuparent active">Newsroom</a><ul class="sf-megamenu"><li class="sf-megamenu-wrapper active-trail middle odd sf-item-3 sf-depth-1 sf-total-children-9 sf-parent-children-0 sf-single-children-9 menuparent"><ol><li id="menu-2219-1" class="first odd sf-item-1 sf-depth-2 sf-no-children"><a href="/latest-speeches" title="" class="sf-depth-2">Latest/what's new</a></li><li id="menu-2213-1" class="middle even sf-item-2 sf-depth-2 sf-no-children"><a href="/cabinet-statements" title="" class="sf-depth-2">Cabinet statements</a></li><li id="menu-2214-1" class="middle odd sf-item-3 sf-depth-2 sf-no-children"><a href="/media-advisories" title="" class="sf-depth-2">Media advisories</a></li><li id="menu-2215-1" class="middle even sf-item-4 sf-depth-2 sf-no-children"><a href="/media-statements" title="" class="sf-depth-2">Media statements</a></li><li id="menu-2216-1" class="middle odd sf-item-5 sf-depth-2 sf-no-children"><a href="/speeches" title="" class="sf-depth-2">Speeches</a></li><li id="menu-2217-1" class="middle even sf-item-6 sf-depth-2 sf-no-children"><a href="/parliamentary-questions-and-answers" title="" class="sf-depth-2">Parliamentary Q&A</a></li><li id="menu-2218-1" class="middle odd sf-item-7 sf-depth-2 sf-no-children"><a href="/events" title="" class="sf-depth-2">Events</a></li><li id="menu-2210-1" class="middle even sf-item-8 sf-depth-2 sf-no-children"><a href="/state-nation-address" class="sf-depth-2">State of the Nation Address</a></li><li id="menu-2211-1" class="last odd sf-item-9 sf-depth-2 sf-no-children"><a href="/national-budget-0" class="sf-depth-2">Budget speeches</a></li></ol></li></ul></li><li id="menu-1279-1" class="middle even sf-item-4 sf-depth-1 sf-total-children-3 sf-parent-children-3 sf-single-children-0 menuparent"><a href="/services" class="sf-depth-1 menuparent">Services</a><ul class="sf-megamenu"><li class="sf-megamenu-wrapper middle even sf-item-4 sf-depth-1 sf-total-children-3 sf-parent-children-3 sf-single-children-0 menuparent"><ol><li id="menu-2501-1" class="first odd sf-item-1 sf-depth-2 sf-total-children-16 sf-parent-children-0 sf-single-children-16 sf-megamenu-column menuparent"><div class="sf-megamenu-column"><a href="/services/services-residents" title="" class="sf-depth-2 menuparent">Services for residents</a><ol><li id="menu-2010-1" class="first odd sf-item-1 sf-depth-3 sf-no-children"><a href="/services/services-residents/birth" title="" class="sf-depth-3">Birth</a></li><li id="menu-2016-1" class="middle even sf-item-2 sf-depth-3 sf-no-children"><a href="/services/services-residents/parenting-all" title="" class="sf-depth-3">Parenting</a></li><li id="menu-2177-1" class="middle odd sf-item-3 sf-depth-3 sf-no-children"><a href="/services/services-residents/health" title="" class="sf-depth-3">Health</a></li><li id="menu-2011-1" class="middle even sf-item-4 sf-depth-3 sf-no-children"><a href="/services/services-residents/social-benefits" title="" class="sf-depth-3">Social benefits</a></li><li id="menu-2017-1" class="middle odd sf-item-5 sf-depth-3 sf-no-children"><a href="/services/services-residents/education-and-training" title="" class="sf-depth-3">Education and training</a></li><li id="menu-2178-1" class="middle even sf-item-6 sf-depth-3 sf-no-children"><a href="/services/services-residents/relationship" title="" class="sf-depth-3">Relationships</a></li><li id="menu-2012-1" class="middle odd sf-item-7 sf-depth-3 sf-no-children"><a href="/services/services-residents/world-work-0" title="" class="sf-depth-3">World of work</a></li><li id="menu-2018-1" class="middle even sf-item-8 sf-depth-3 sf-no-children"><a href="/services/services-residents/place-live" title="" class="sf-depth-3">A place to live</a></li><li id="menu-2179-1" class="middle odd sf-item-9 sf-depth-3 sf-no-children"><a href="/services/services-residents/tv-postal-services" title="" class="sf-depth-3">TV and postal services</a></li><li id="menu-2013-1" class="middle even sf-item-10 sf-depth-3 sf-no-children"><a href="/services/services-residents/driving-0" title="" class="sf-depth-3">Driving</a></li><li id="menu-2019-1" class="middle odd sf-item-11 sf-depth-3 sf-no-children"><a href="/services/services-residents/travel-outside-sa" title="" class="sf-depth-3">Travel outside SA</a></li><li id="menu-2054-1" class="middle even sf-item-12 sf-depth-3 sf-no-children"><a href="/services/services-residents/citizenship-0" title="" class="sf-depth-3">Citizenship</a></li><li id="menu-2014-1" class="middle odd sf-item-13 sf-depth-3 sf-no-children"><a href="/services/services-residents/information-government" title="" class="sf-depth-3">Information from government</a></li><li id="menu-2180-1" class="middle even sf-item-14 sf-depth-3 sf-no-children"><a href="/services/services-residents/dealing-law-0" title="" class="sf-depth-3">Dealing with the law</a></li><li id="menu-2181-1" class="middle odd sf-item-15 sf-depth-3 sf-no-children"><a href="/services/services-residents/retirement-old-age" title="" class="sf-depth-3">Retirement and old age</a></li><li id="menu-2015-1" class="last even sf-item-16 sf-depth-3 sf-no-children"><a href="/services/services-residents/end-life" title="" class="sf-depth-3">End of life</a></li></ol></div></li><li id="menu-1987-1" class="middle even sf-item-2 sf-depth-2 sf-total-children-12 sf-parent-children-0 sf-single-children-12 sf-megamenu-column menuparent"><div class="sf-megamenu-column"><a href="/services/services-organisations" class="sf-depth-2 menuparent">Services for organisations</a><ol><li id="menu-2024-1" class="first odd sf-item-1 sf-depth-3 sf-no-children"><a href="/services/services-organisations/register-business-organisation" title="" class="sf-depth-3">Register business or organisation</a></li><li id="menu-2028-1" class="middle even sf-item-2 sf-depth-3 sf-no-children"><a href="/services/services-organisations/change-registration" title="" class="sf-depth-3">Change registration</a></li><li id="menu-2020-1" class="middle odd sf-item-3 sf-depth-3 sf-no-children"><a href="/services/services-organisations/business-incentives" title="" class="sf-depth-3">Business incentives</a></li><li id="menu-2021-1" class="middle even sf-item-4 sf-depth-3 sf-no-children"><a href="/services/services-organisations/deregister-business" title="" class="sf-depth-3">Deregister business</a></li><li id="menu-2025-1" class="middle odd sf-item-5 sf-depth-3 sf-no-children"><a href="/services/services-organisations/tax" title="" class="sf-depth-3">Tax</a></li><li id="menu-2029-1" class="middle even sf-item-6 sf-depth-3 sf-no-children"><a href="/services/services-organisations/intellectual-property" title="" class="sf-depth-3">Intellectual property</a></li><li id="menu-2022-1" class="middle odd sf-item-7 sf-depth-3 sf-no-children"><a href="/services/services-organisations/import" title="" class="sf-depth-3">Import</a></li><li id="menu-2026-1" class="middle even sf-item-8 sf-depth-3 sf-no-children"><a href="/services/services-organisations/export-permits" title="" class="sf-depth-3">Export permits</a></li><li id="menu-2030-1" class="middle odd sf-item-9 sf-depth-3 sf-no-children"><a href="/services/services-organisations/permits-licences-and-rights" title="" class="sf-depth-3">Permits licences and rights</a></li><li id="menu-2023-1" class="middle even sf-item-10 sf-depth-3 sf-no-children"><a href="/services/services-organisations/communication" title="" class="sf-depth-3">Communication</a></li><li id="menu-2027-1" class="middle odd sf-item-11 sf-depth-3 sf-no-children"><a href="/services/services-organisations/transport" title="" class="sf-depth-3">Transport</a></li><li id="menu-2183-1" class="last even sf-item-12 sf-depth-3 sf-no-children"><a href="/services/services-organisations/labour-0" title="" class="sf-depth-3">Labour</a></li></ol></div></li><li id="menu-1988-1" class="last odd sf-item-3 sf-depth-2 sf-total-children-3 sf-parent-children-0 sf-single-children-3 sf-megamenu-column menuparent"><div class="sf-megamenu-column"><a href="/services/services-foreign-nationals" title="" class="sf-depth-2 menuparent">Services for foreign nationals</a><ol><li id="menu-2031-1" class="first odd sf-item-1 sf-depth-3 sf-no-children"><a href="/services/services-foreigners/temporary-residence" title="" class="sf-depth-3">Temporary residence</a></li><li id="menu-2032-1" class="middle even sf-item-2 sf-depth-3 sf-no-children"><a href="/services/services-foreigners/permanent-residence" title="" class="sf-depth-3">Permanent residence</a></li><li id="menu-2033-1" class="last odd sf-item-3 sf-depth-3 sf-no-children"><a href="/services/services-foreigners/driving" title="" class="sf-depth-3">Driving</a></li></ol></div></li></ol></li></ul></li><li id="menu-1951-1" class="last odd sf-item-5 sf-depth-1 sf-total-children-3 sf-parent-children-2 sf-single-children-1 menuparent"><a href="/document/latest" title="Government Documents" class="sf-depth-1 menuparent">Documents</a><ul class="sf-megamenu"><li class="sf-megamenu-wrapper last odd sf-item-5 sf-depth-1 sf-total-children-3 sf-parent-children-2 sf-single-children-1 menuparent"><ol><li id="menu-2244-1" class="first odd sf-item-1 sf-depth-2 sf-no-children"><a href="/document/latest" title="" class="sf-depth-2">Latest/what's new</a></li><li id="menu-2133-1" class="middle even sf-item-2 sf-depth-2 sf-total-children-6 sf-parent-children-0 sf-single-children-6 sf-megamenu-column menuparent"><div class="sf-megamenu-column"><a href="/documents/tender" title="Government Tenders" class="sf-depth-2 menuparent">Tenders</a><ol><li id="menu-2138-1" class="first odd sf-item-1 sf-depth-3 sf-no-children"><a href="/documents/acts" title="" class="sf-depth-3">Acts</a></li><li id="menu-1952-1" class="middle even sf-item-2 sf-depth-3 sf-no-children"><a href="/documents/constitution/constitution-republic-south-africa-1996-1" title="Constitution" class="sf-depth-3">Constitution of SA</a></li><li id="menu-1997-1" class="middle odd sf-item-3 sf-depth-3 sf-no-children"><a href="/documents/bills" title="" class="sf-depth-3">Bills</a></li><li id="menu-1996-1" class="middle even sf-item-4 sf-depth-3 sf-no-children"><a href="/documents/draft-bills" title="" class="sf-depth-3">Draft bills</a></li><li id="menu-1994-1" class="middle odd sf-item-5 sf-depth-3 sf-no-children"><a href="/documents/notices" title="" class="sf-depth-3">Notices</a></li><li id="menu-3408-1" class="last even sf-item-6 sf-depth-3 sf-no-children"><a href="/documents/awarded-tenders" title="" class="sf-depth-3">Tenders awarded</a></li></ol></div></li><li id="menu-1954-1" class="last odd sf-item-3 sf-depth-2 sf-total-children-6 sf-parent-children-0 sf-single-children-6 sf-megamenu-column menuparent"><div class="sf-megamenu-column"><a href="/documents/white-papers" title="" class="sf-depth-2 menuparent">White papers</a><ol><li id="menu-1993-1" class="first odd sf-item-1 sf-depth-3 sf-no-children"><a href="/documents/green-papers" title="" class="sf-depth-3">Green papers</a></li><li id="menu-1995-1" class="middle even sf-item-2 sf-depth-3 sf-no-children"><a href="/documents/annual-report" title="" class="sf-depth-3">Annual reports</a></li><li id="menu-1992-1" class="middle odd sf-item-3 sf-depth-3 sf-no-children"><a href="/documents/other-documents" title="" class="sf-depth-3">Other documents</a></li><li id="menu-2135-1" class="middle even sf-item-4 sf-depth-3 sf-no-children"><a href="/documents/public-comment" title="" class="sf-depth-3">Documents for public comment</a></li><li id="menu-2134-1" class="middle odd sf-item-5 sf-depth-3 sf-no-children"><a href="http://beta2.statssa.gov.za/" title="" class="sf-depth-3">Statistical documents</a></li><li id="menu-2137-1" class="last even sf-item-6 sf-depth-3 sf-no-children"><a href="/documents/parliamentary-documents" class="sf-depth-3">Parliamentary documents</a></li></ol></div></li></ol></li></ul></li></ul> </div></div></div><div class="block block-views block--exp-search-page block-views-exp-search-page even block-without-title" id="block-views-exp-search-page"><div class="block-inner clearfix"><div class="content clearfix"><form action="/search" method="get" id="views-exposed-form-search-page" accept-charset="UTF-8"><div><div class="views-exposed-form"><div class="views-exposed-widgets clearfix"><div id="edit-search-query-wrapper" class="views-exposed-widget views-widget-filter-search_api_views_fulltext"><label for="edit-search-query">Search </label><div class="views-widget"><div class="form-item form-type-textfield form-item-search-query"><input type="text" id="edit-search-query" name="search_query" value="" size="30" maxlength="128" class="form-text" /></div></div><div class="description">Search the site </div></div><div class="views-exposed-widget views-submit-button"><input type="submit" id="edit-submit-search" name="" value="Search" class="form-submit" /> </div></div></div></div></form> </div></div></div> </div></div></div></div></header><section id="section-content" class="section section-content"><div id="zone-content-wrapper" class="zone-wrapper zone-content-wrapper clearfix"><div id="zone-content" class="zone zone-content clearfix equal-height-container container-12"><div id="breadcrumb" class="grid-12"><h2 class="element-invisible">You are here</h2><div class="breadcrumb"><a href="/">Home</a></div></div><aside class="grid-3 region region-sidebar-first equal-height-element" id="region-sidebar-first"><div class="region-inner region-sidebar-first-inner"><div class="block block-views block--exp-speeches-views-page-5 block-views-exp-speeches-views-page-5 odd block-without-title" id="block-views-exp-speeches-views-page-5"><div class="block-inner clearfix"><div class="content clearfix"><form action="/newsroom" method="get" id="views-exposed-form-speeches-views-page-5" accept-charset="UTF-8"><div><div class="views-exposed-form"><div class="views-exposed-widgets clearfix"><div id="edit-title-field-value-wrapper" class="views-exposed-widget views-widget-filter-title_field_value"><label for="edit-title-field-value">Keyword </label><div class="views-widget"><div class="form-item form-type-textfield form-item-title-field-value"><input type="text" id="edit-title-field-value" name="title_field_value" value="" size="30" maxlength="128" class="form-text form-autocomplete" /><input type="hidden" id="edit-title-field-value-autocomplete" value="https://www.gov.za/?q=autocomplete_filter/title_field_value/speeches_views/page_5/0" disabled="disabled" class="autocomplete" /></div></div></div><div id="edit-field-gcis-speech-category-tid-wrapper" class="views-exposed-widget views-widget-filter-field_gcis_speech_category_tid"><label for="edit-field-gcis-speech-category-tid">Categories </label><div class="views-widget"><div class="form-item form-type-select form-item-field-gcis-speech-category-tid"><select id="edit-field-gcis-speech-category-tid" name="field_gcis_speech_category_tid" class="form-select"><option value="All" selected="selected">- Any -</option><option value="832"></option><option value="950">Media advisories</option><option value="336">Cabinet statements</option><option value="338">Statements</option><option value="337">Speeches</option><option value="340">Parliamentary questions and answers</option><option value="339">Transcripts</option><option value="834">Budget</option><option value="342">Events</option></select></div></div></div><div id="edit-field-gcis-speech-government-lvl-tid-wrapper" class="views-exposed-widget views-widget-filter-field_gcis_speech_government_lvl_tid"><label for="edit-field-gcis-speech-government-lvl-tid">Government Level </label><div class="views-widget"><div class="form-item form-type-select form-item-field-gcis-speech-government-lvl-tid"><select id="edit-field-gcis-speech-government-lvl-tid" name="field_gcis_speech_government_lvl_tid" class="form-select"><option value="All" selected="selected">- Any -</option><option value="345">Local</option><option value="755">National</option><option value="344">Provincial</option><option value="833">Unspecified</option></select></div></div></div><div id="edit-field-gcis-speech-subjects-tid-1-wrapper" class="views-exposed-widget views-widget-filter-field_gcis_speech_subjects_tid_1"><label for="edit-field-gcis-speech-subjects-tid-1">Subjects </label><div class="views-widget"><div class="form-item form-type-select form-item-field-gcis-speech-subjects-tid-1"><select multiple="multiple" name="field_gcis_speech_subjects_tid_1[]" id="edit-field-gcis-speech-subjects-tid-1" size="9" class="form-select"><option value="718">16 days of activism</option><option value="740">20 years of freedom</option><option value="715">Africa</option><option value="681">African Peer Review Mechanism (APRM)</option><option value="713">Agriculture</option><option value="703">Anti-corruption initiatives</option><option value="682">Arts and culture</option><option value="683">Aviation</option><option value="721">Black Economic Empowerment</option><option value="684">Budget: national</option><option value="680">Budget: provincial</option><option value="650">Business</option><option value="685">Cabinet statement</option><option value="722">Children's issues</option><option value="726">Climate change</option><option value="716">Cluster media briefings</option><option value="686">Communications</option><option value="687">Community Development Workers (CDW)</option><option value="688">Constitutional affairs</option><option value="689">Correctional services</option><option value="730">Cultural and traditional affairs</option><option value="690">Defence</option><option value="691">Development</option><option value="646">Disaster management</option><option value="706">Economy</option><option value="705">Education</option><option value="711">Elections</option><option value="692">Energy</option><option value="712">Energy efficiency</option><option value="693">Environment</option><option value="736">Equality</option><option value="744">Events</option><option value="649">Expanded Public Works Programme (EPWP)</option><option value="714">Fighting crime</option><option value="694">Finance</option><option value="697">Fishery</option><option value="739">Food security</option><option value="696">Forestry</option><option value="643">Fraud and corruption</option><option value="698">Freedom day</option><option value="742">Freedom Month</option><option value="725">Governance</option><option value="699">Government services</option><option value="700">Growth and development</option><option value="707">Health</option><option value="701">History</option><option value="710">HIV and AIDS</option><option value="702">Home affairs</option><option value="651">Housing</option><option value="652">Human and social issues</option><option value="653">Human rights</option><option value="654">Human trafficking</option><option value="735">Imbizo</option><option value="727">Immigration</option><option value="743">Infrastructure</option><option value="695">International relations</option><option value="728">Job creation</option><option value="729">Justice</option><option value="655">Labour</option><option value="656">Land</option><option value="657">Legal issues</option><option value="658">Local government</option><option value="738">Mandela Month</option><option value="732">Media relations</option><option value="745">Military veterans</option><option value="659">Mining and minerals</option><option value="737">National Development Plan</option><option value="746">Nelson Mandela</option><option value="660">Nepad</option><option value="733">People with disabilities</option><option value="661">Presidential pardons</option><option value="720">Provincial executive council</option><option value="662">Provincial government</option><option value="663">Public enterprises</option><option value="664">Public service</option><option value="665">Public works</option><option value="731">Road safety</option><option value="644">Rural development</option><option value="666">SADC</option><option value="667">Safety and security</option><option value="668">Science and technology</option><option value="723">Service delivery</option><option value="704">Skills development</option><option value="669">Social development</option><option value="670">Sport and recreation</option><option value="741">State of the City Address</option><option value="717">State of the Nation address</option><option value="647">State of the Province Address</option><option value="671">Statistics</option><option value="648">Tax</option><option value="672">Telecommunications</option><option value="673">Tourism</option><option value="724">Trade and industry</option><option value="674">Traditional affairs</option><option value="708">Transport</option><option value="679">Un-categorised items</option><option value="675">Water</option><option value="676">Women's issues</option><option value="677">Xenophobia</option><option value="678">Youth affairs</option></select></div></div></div><div id="edit-field-gcis-speech-date-value-1-wrapper" class="views-exposed-widget views-widget-filter-field_gcis_speech_date_value_1"><div class="views-widget"><div id="edit-field-gcis-speech-date-value-min-wrapper"><div id="edit-field-gcis-speech-date-value-min-inside-wrapper"><div class="container-inline-date"><div class="form-item form-type-date-popup form-item-field-gcis-speech-date-value-1-min"><label for="edit-field-gcis-speech-date-value-1-min">Start date </label><div id="edit-field-gcis-speech-date-value-1-min" class="date-padding"><div class="form-item form-type-textfield form-item-field-gcis-speech-date-value-1-min-date"><label class="element-invisible" for="edit-field-gcis-speech-date-value-1-min-datepicker-popup-1">Date </label><input type="text" id="edit-field-gcis-speech-date-value-1-min-datepicker-popup-1" name="field_gcis_speech_date_value_1[min][date]" value="" size="20" maxlength="30" class="form-text" /><div class="description"> E.g., 17 Jul 2021</div></div></div></div></div></div></div><div id="edit-field-gcis-speech-date-value-max-wrapper"><div id="edit-field-gcis-speech-date-value-max-inside-wrapper"><div class="container-inline-date"><div class="form-item form-type-date-popup form-item-field-gcis-speech-date-value-1-max"><label for="edit-field-gcis-speech-date-value-1-max">End date </label><div id="edit-field-gcis-speech-date-value-1-max" class="date-padding"><div class="form-item form-type-textfield form-item-field-gcis-speech-date-value-1-max-date"><label class="element-invisible" for="edit-field-gcis-speech-date-value-1-max-datepicker-popup-1">Date </label><input type="text" id="edit-field-gcis-speech-date-value-1-max-datepicker-popup-1" name="field_gcis_speech_date_value_1[max][date]" value="" size="20" maxlength="30" class="form-text" /><div class="description"> E.g., 17 Jul 2021</div></div></div></div></div></div></div> </div></div><div class="views-exposed-widget views-submit-button"><input type="submit" id="edit-submit-speeches-views" name="" value="Search" class="form-submit" /> </div><div class="views-exposed-widget views-reset-button"><input type="submit" id="edit-reset" name="op" value="Reset" class="form-submit" /> </div></div></div></div></form> </div></div></div><section class="block block-block block-26 block-block-26 even" id="block-block-26"><div class="block-inner clearfix"><h2 class="block-title">Related Links</h2><div class="content clearfix"><p><a href="events">Events</a></p><p><a href="/state-nation-address">State of the Nation address</a></p><p><a href="national-budget-0">Budget speeches</a></p><p><a href="/node/733867/">Audio files</a></p></div></div></section> </div></aside><div class="grid-9 region region-content equal-height-element" id="region-content"><div class="region-inner region-content-inner"><a id="main-content"></a><h1 class="title" id="page-title">Newsroom</h1><div class="block block-system block-main block-system-main odd block-without-title" id="block-system-main"><div class="block-inner clearfix"><div class="content clearfix"><div class="view view-speeches-views view-id-speeches_views view-display-id-page_5 view-dom-id-db6e85de7b536c7602292a7c4e857ef9"><div class="view-header"><p>You can use the filters to show only results that match your interests</p></div><div class="view-content"><table class="views-table cols-2" class="views-table cols-2"><thead><tr><th class="views-field views-field-title-field" scope="col"><a href="/newsroom?title_field_value=&field_gcis_speech_category_tid=All&field_gcis_speech_government_lvl_tid=All&&field_gcis_speech_date_value_1%5Bmin%5D&field_gcis_speech_date_value_1%5Bmax%5D&order=title_field&sort=asc" title="sort by Title" class="active">Title</a> </th><th class="views-field views-field-field-gcis-speech-date" scope="col"><a href="/newsroom?title_field_value=&field_gcis_speech_category_tid=All&field_gcis_speech_government_lvl_tid=All&&field_gcis_speech_date_value_1%5Bmin%5D&field_gcis_speech_date_value_1%5Bmax%5D&order=field_gcis_speech_date&sort=asc" title="sort by Date" class="active">Date</a> </th></tr></thead><tbody><tr class="odd views-row-first"><td class="views-field views-field-title-field"><a href="/speeches/president-cyril-ramaphosa-conducts-oversight-visit-kwazulu-natal-16-jul-16-jul-2021-0000">President Cyril Ramaphosa conducts oversight visit to Kwazulu-Natal, 16 Jul</a> </td><td class="views-field views-field-field-gcis-speech-date"><span class="date-display-single" property="dc:date" datatype="xsd:dateTime" content="2021-07-16T00:00:00+02:00">16 Jul 2021</span> </td></tr><tr class="even"><td class="views-field views-field-title-field"><a href="/speeches/western-cape-government-updates-ongoing-public-unrest-and-taxi-violence-15-jul-2021-0000">Western Cape Government updates on ongoing public unrest and taxi violence</a> </td><td class="views-field views-field-field-gcis-speech-date"><span class="date-display-single" property="dc:date" datatype="xsd:dateTime" content="2021-07-15T00:00:00+02:00">15 Jul 2021</span> </td></tr><tr class="odd"><td class="views-field views-field-title-field"><a href="/speeches/premier-alan-winde-coronavirus-covid-19-and-vaccines-western-cape-15-jul-2021-0000">Premier Alan Winde on Coronavirus COVID-19 and vaccines in the Western Cape</a> </td><td class="views-field views-field-field-gcis-speech-date"><span class="date-display-single" property="dc:date" datatype="xsd:dateTime" content="2021-07-15T00:00:00+02:00">15 Jul 2021</span> </td></tr><tr class="even"><td class="views-field views-field-title-field"><a href="/speeches/minister-khumbudzo-ntshavheni-update-violent-protests-some-parts-south-africa-15-jul-2021">Minister Khumbudzo Ntshavheni: Update on violent protests in some parts of South Africa</a> </td><td class="views-field views-field-field-gcis-speech-date"><span class="date-display-single" property="dc:date" datatype="xsd:dateTime" content="2021-07-15T00:00:00+02:00">15 Jul 2021</span> </td></tr><tr class="odd"><td class="views-field views-field-title-field"><a href="/speeches/minister-thoko-didiza-meets-agricultural-sector-stakeholders-15-jul-2021-0000">Minister Thoko Didiza meets agricultural sector stakeholders</a> </td><td class="views-field views-field-field-gcis-speech-date"><span class="date-display-single" property="dc:date" datatype="xsd:dateTime" content="2021-07-15T00:00:00+02:00">15 Jul 2021</span> </td></tr><tr class="even"><td class="views-field views-field-title-field"><a href="/speeches/deputy-minister-njabulo-nzuza-visits-vandalised-home-affairs-office-eshowe-16-jul-15-jul">Deputy Minister Njabulo Nzuza visits vandalised Home Affairs office in Eshowe, 16 Jul </a> </td><td class="views-field views-field-field-gcis-speech-date"><span class="date-display-single" property="dc:date" datatype="xsd:dateTime" content="2021-07-15T00:0
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Your code seems to be basically working. I did make one slight change to the basic code. Since your Filename was "Extract," I changed the file name in the Infile in the Data step to "Extract" from "Source"
filename extract "/Location/source.txt";
infile source Extract length=len lrecl=32767;
However, instead of using FIND, I would use COUNT. The COUNT function will give you the number of occurrences of a sub-string inside of a string.
Count_str = COUNT(line,'<a href="/speeches/');
I would then add a DO loop that iteratively substrings through the line to parse out each of the URL's of the speeches.
Start_Pos = 1;
DO i = 1 TO Count_Str;
Str_To_Search = SUBSTR(line, Start_Pos);
Str_Pos = FIND(Str_to_Search,'<a href="/speeches/');
links = compress(scan(substr(Str_to_search,Str_Pos+9),1,'>'),'"');
Link_Len = LENGTHN(STRIP(Links));
full_link = cats('https://www.gov.za',links);
OUTPUT;
Start_Pos = Start_Pos + Str_Pos + Link_Len;
END;
The URL's that are extracted are shown below, and, when I open a browser to a random URL, it shows one of the speeches.
https://www.gov.za/speeches/president-cyril-ramaphosa-update-security-situation-country-16-jul-2021-0000 https://www.gov.za/speeches/president-cyril-ramaphosa%C2%A0addresses-nation-security-situation-country-16-jul-16-jul-2021 https://www.gov.za/speeches/mineral-resources-and-energy-gives-clarification-regulations-prohibiting-retail-sales https://www.gov.za/speeches/acting-minister-khumbudzo-ntshavheni-update-security-situation-prevailing-country-16-july https://www.gov.za/speeches/mec-jacob-mamabolo-visits-sophiatown-and-westbury-communities-17-jul-16-jul-2021-0000 https://www.gov.za/speeches/acting-minister-presidency-khumbudzo-ntshavheni-briefs-media-ongoing-violent-protests-0
Jim
Full program is below. There's a bit of extra code that I added for debugging purposes.
%LET Width = 22;
filename extract "C:\Users\jbarbour\Documents\SAS\Pgm\Training\HTTP\source.txt";
proc http
method="GET"
out=extract
url="https://www.gov.za/newsroom";
run;
data work.report;
keep full_link;
LENGTH Links $32767;
LENGTH Full_Link $32767;
infile Extract length=len lrecl=32767 end=end_of_file;
if end_of_file then
DO;
PUTLOG "NOTE- ";
PUTLOG "NOTE- %Format_Dashes(&Width)";
PUTLOG "NOTE: | " Non_Blank_lines= COMMA17.;
PUTLOG "NOTE- | " Blank_lines= COMMA17.;
PUTLOG "NOTE- | " Speeches_Found= COMMA17.;
PUTLOG "NOTE- | " No_Speech_Found= COMMA17.;
PUTLOG "NOTE- %Format_Dashes(&Width)";
PUTLOG "NOTE- ";
END;
input line $varying32767. len;
line = strip(line);
if len > 0 then
DO;
Non_blank_Lines + 1;
END;
else
DO;
blank_Lines + 1;
DELETE;
END;
Count_str = COUNT(line,'<a href="/speeches/');
if Count_Str > 0 then
DO;
Speeches_Found + 1;
END;
else
DO;
No_Speech_Found + 1;
DELETE;
END;
Start_Pos = 1;
DO i = 1 TO Count_Str;
Str_To_Search = SUBSTR(line, Start_Pos);
Str_Pos = FIND(Str_to_Search,'<a href="/speeches/');
links = compress(scan(substr(Str_to_search,Str_Pos+9),1,'>'),'"');
Link_Len = LENGTHN(STRIP(Links));
full_link = cats('https://www.gov.za',links);
OUTPUT;
Start_Pos = Start_Pos + Str_Pos + Link_Len;
END;
DELETE;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Your code seems to be basically working. I did make one slight change to the basic code. Since your Filename was "Extract," I changed the file name in the Infile in the Data step to "Extract" from "Source"
filename extract "/Location/source.txt";
infile source Extract length=len lrecl=32767;
However, instead of using FIND, I would use COUNT. The COUNT function will give you the number of occurrences of a sub-string inside of a string.
Count_str = COUNT(line,'<a href="/speeches/');
I would then add a DO loop that iteratively substrings through the line to parse out each of the URL's of the speeches.
Start_Pos = 1;
DO i = 1 TO Count_Str;
Str_To_Search = SUBSTR(line, Start_Pos);
Str_Pos = FIND(Str_to_Search,'<a href="/speeches/');
links = compress(scan(substr(Str_to_search,Str_Pos+9),1,'>'),'"');
Link_Len = LENGTHN(STRIP(Links));
full_link = cats('https://www.gov.za',links);
OUTPUT;
Start_Pos = Start_Pos + Str_Pos + Link_Len;
END;
The URL's that are extracted are shown below, and, when I open a browser to a random URL, it shows one of the speeches.
https://www.gov.za/speeches/president-cyril-ramaphosa-update-security-situation-country-16-jul-2021-0000 https://www.gov.za/speeches/president-cyril-ramaphosa%C2%A0addresses-nation-security-situation-country-16-jul-16-jul-2021 https://www.gov.za/speeches/mineral-resources-and-energy-gives-clarification-regulations-prohibiting-retail-sales https://www.gov.za/speeches/acting-minister-khumbudzo-ntshavheni-update-security-situation-prevailing-country-16-july https://www.gov.za/speeches/mec-jacob-mamabolo-visits-sophiatown-and-westbury-communities-17-jul-16-jul-2021-0000 https://www.gov.za/speeches/acting-minister-presidency-khumbudzo-ntshavheni-briefs-media-ongoing-violent-protests-0
Jim
Full program is below. There's a bit of extra code that I added for debugging purposes.
%LET Width = 22;
filename extract "C:\Users\jbarbour\Documents\SAS\Pgm\Training\HTTP\source.txt";
proc http
method="GET"
out=extract
url="https://www.gov.za/newsroom";
run;
data work.report;
keep full_link;
LENGTH Links $32767;
LENGTH Full_Link $32767;
infile Extract length=len lrecl=32767 end=end_of_file;
if end_of_file then
DO;
PUTLOG "NOTE- ";
PUTLOG "NOTE- %Format_Dashes(&Width)";
PUTLOG "NOTE: | " Non_Blank_lines= COMMA17.;
PUTLOG "NOTE- | " Blank_lines= COMMA17.;
PUTLOG "NOTE- | " Speeches_Found= COMMA17.;
PUTLOG "NOTE- | " No_Speech_Found= COMMA17.;
PUTLOG "NOTE- %Format_Dashes(&Width)";
PUTLOG "NOTE- ";
END;
input line $varying32767. len;
line = strip(line);
if len > 0 then
DO;
Non_blank_Lines + 1;
END;
else
DO;
blank_Lines + 1;
DELETE;
END;
Count_str = COUNT(line,'<a href="/speeches/');
if Count_Str > 0 then
DO;
Speeches_Found + 1;
END;
else
DO;
No_Speech_Found + 1;
DELETE;
END;
Start_Pos = 1;
DO i = 1 TO Count_Str;
Str_To_Search = SUBSTR(line, Start_Pos);
Str_Pos = FIND(Str_to_Search,'<a href="/speeches/');
links = compress(scan(substr(Str_to_search,Str_Pos+9),1,'>'),'"');
Link_Len = LENGTHN(STRIP(Links));
full_link = cats('https://www.gov.za',links);
OUTPUT;
Start_Pos = Start_Pos + Str_Pos + Link_Len;
END;
DELETE;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You're welcome. Thanks for posting an interesting application. I haven't had occasion to use Proc HTTP; this is a nice example.
Jim