BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
kaziumair
Quartz | Level 8

Hi everyone ,

 

I am trying to scrape a website to find links to articles, however when I use proc http to fetch the html , the html code with all the links is getting stored in a single row , I am using find() to find the position of the start of the link , since I am using find() I can only find the first occurrence . 

Please suggest a way to find all the links .

Attaching the code and the row below.

filename extract "/Location/source.txt";

proc http
	method="GET"
	out=extract
	url="https://www.gov.za/newsroom";
run;

data work.report;
infile source length=len lrecl=32767;
input line $varying32767. len;
 line = strip(line);
 if len>0;
 find_str = find(line,'<a href="/speeches/');
	if find_str gt 0;
	if find_str gt 0 then do;
		links=compress(scan(substr(line,find_str+9),1,'>'),'"');
		full_link=cats('https://www.gov.za',links);
	end;
	keep full_link;
run;
<div class="page clearfix" id="page"><header id="section-header" class="section section-header"><div id="zone-branding-wrapper" class="zone-wrapper zone-branding-wrapper clearfix"><div id="zone-branding" class="zone zone-branding clearfix container-12"><div class="grid-12 region region-branding" id="region-branding"><div class="region-inner region-branding-inner"><div class="branding-data clearfix"><div class="logo-img"><a href="/" rel="home" title="South African Government"><img src="https://www.gov.za/sites/all/themes/custom/eco_omega/logo.png?v=20200917" alt="South African Government" id="logo" /></a>      </div><hgroup class="site-name-slogan"><h2 class="site-name"><a href="/" title="Home">South African Government</a></h2><h3 class="site-domain"><a href="/" rel="home" title="South African Government">www.gov.za</a></h3><h6 class="site-slogan">Let's grow South Africa together</h6></hgroup><div class="flag-img"><a href="/" rel="home" title="South African Government"><img typeof="foaf:Image" src="https://www.gov.za/sites/all/themes/custom/eco_omega/images/flag-south-africa.svg" alt="" /></a>      </div></div></div></div></div></div><div id="zone-menu-wrapper" class="zone-wrapper zone-menu-wrapper clearfix"><div id="zone-menu" class="zone zone-menu clearfix container-12"><div class="grid-12 region region-menu" id="region-menu"><div class="region-inner region-menu-inner"><nav class="navigation"><h2 class="element-invisible">Main menu</h2><ul id="main-menu" class="links inline clearfix main-menu"><li class="menu-205 first"><a href="/">Home</a></li><li class="menu-3411"><a href="/about">About</a></li><li class="menu-1950 active-trail active"><a href="/newsroom" title="Speeches and statements" class="active-trail active">Newsroom</a></li><li class="menu-1279"><a href="/services">Services</a></li><li class="menu-1951 last"><a href="/document/latest" title="Government Documents">Documents</a></li></ul>          </nav><div class="block block-superfish block-1 block-superfish-1 odd block-without-title" id="block-superfish-1"><div class="block-inner clearfix"><div class="content clearfix"><ul id="superfish-1" class="menu sf-menu sf-main-menu sf-horizontal sf-style-space sf-total-items-5 sf-parent-items-4 sf-single-items-1"><li id="menu-205-1" class="first odd sf-item-1 sf-depth-1 sf-no-children"><a href="/" class="sf-depth-1">Home</a></li><li id="menu-3411-1" class="middle even sf-item-2 sf-depth-1 sf-total-children-2 sf-parent-children-1 sf-single-children-1 menuparent"><a href="/about" class="sf-depth-1 menuparent">About</a><ul class="sf-megamenu"><li class="sf-megamenu-wrapper middle even sf-item-2 sf-depth-1 sf-total-children-2 sf-parent-children-1 sf-single-children-1 menuparent"><ol><li id="menu-1043-1" class="first odd sf-item-1 sf-depth-2 sf-total-children-8 sf-parent-children-0 sf-single-children-8 sf-megamenu-column menuparent"><div class="sf-megamenu-column"><a href="/about-sa" class="sf-depth-2 menuparent">About SA</a><ol><li id="menu-2608-1" class="first odd sf-item-1 sf-depth-3 sf-no-children"><a href="/about-government/contact-directory" title="" class="sf-depth-3">Contact directory</a></li><li id="menu-2237-1" class="middle even sf-item-2 sf-depth-3 sf-no-children"><a href="/faq" title="" class="sf-depth-3">FAQs</a></li><li id="menu-2142-1" class="middle odd sf-item-3 sf-depth-3 sf-no-children"><a href="/about-government/government-system" title="" class="sf-depth-3">Government system</a></li><li id="menu-2635-1" class="middle even sf-item-4 sf-depth-3 sf-no-children"><a href="/about-government/government-vacancies" title="" class="sf-depth-3">Government vacancies</a></li><li id="menu-2236-1" class="middle odd sf-item-5 sf-depth-3 sf-no-children"><a href="/issues/key-issues" title="" class="sf-depth-3">Key issues</a></li><li id="menu-2141-1" class="middle even sf-item-6 sf-depth-3 sf-no-children"><a href="/about-government/government-programmes/projects-and-campaigns" title="" class="sf-depth-3">Government programmes</a></li><li id="menu-1216-1" class="middle odd sf-item-7 sf-depth-3 sf-no-children"><a href="/about-government/leaders" title="" class="sf-depth-3">Government leaders</a></li><li id="menu-2000-1" class="last even sf-item-8 sf-depth-3 sf-no-children"><a href="/about-government/national-orders" title="" class="sf-depth-3">National Orders</a></li></ol></div></li><li id="menu-1964-1" class="last even sf-item-2 sf-depth-2 sf-no-children"><a href="/links" class="sf-depth-2">Links</a></li></ol></li></ul></li><li id="menu-1950-1" class="active-trail middle odd sf-item-3 sf-depth-1 sf-total-children-9 sf-parent-children-0 sf-single-children-9 menuparent"><a href="/newsroom" title="Speeches and statements" class="sf-depth-1 menuparent active">Newsroom</a><ul class="sf-megamenu"><li class="sf-megamenu-wrapper active-trail middle odd sf-item-3 sf-depth-1 sf-total-children-9 sf-parent-children-0 sf-single-children-9 menuparent"><ol><li id="menu-2219-1" class="first odd sf-item-1 sf-depth-2 sf-no-children"><a href="/latest-speeches" title="" class="sf-depth-2">Latest/what&#039;s new</a></li><li id="menu-2213-1" class="middle even sf-item-2 sf-depth-2 sf-no-children"><a href="/cabinet-statements" title="" class="sf-depth-2">Cabinet statements</a></li><li id="menu-2214-1" class="middle odd sf-item-3 sf-depth-2 sf-no-children"><a href="/media-advisories" title="" class="sf-depth-2">Media advisories</a></li><li id="menu-2215-1" class="middle even sf-item-4 sf-depth-2 sf-no-children"><a href="/media-statements" title="" class="sf-depth-2">Media statements</a></li><li id="menu-2216-1" class="middle odd sf-item-5 sf-depth-2 sf-no-children"><a href="/speeches" title="" class="sf-depth-2">Speeches</a></li><li id="menu-2217-1" class="middle even sf-item-6 sf-depth-2 sf-no-children"><a href="/parliamentary-questions-and-answers" title="" class="sf-depth-2">Parliamentary Q&amp;A</a></li><li id="menu-2218-1" class="middle odd sf-item-7 sf-depth-2 sf-no-children"><a href="/events" title="" class="sf-depth-2">Events</a></li><li id="menu-2210-1" class="middle even sf-item-8 sf-depth-2 sf-no-children"><a href="/state-nation-address" class="sf-depth-2">State of the Nation Address</a></li><li id="menu-2211-1" class="last odd sf-item-9 sf-depth-2 sf-no-children"><a href="/national-budget-0" class="sf-depth-2">Budget speeches</a></li></ol></li></ul></li><li id="menu-1279-1" class="middle even sf-item-4 sf-depth-1 sf-total-children-3 sf-parent-children-3 sf-single-children-0 menuparent"><a href="/services" class="sf-depth-1 menuparent">Services</a><ul class="sf-megamenu"><li class="sf-megamenu-wrapper middle even sf-item-4 sf-depth-1 sf-total-children-3 sf-parent-children-3 sf-single-children-0 menuparent"><ol><li id="menu-2501-1" class="first odd sf-item-1 sf-depth-2 sf-total-children-16 sf-parent-children-0 sf-single-children-16 sf-megamenu-column menuparent"><div class="sf-megamenu-column"><a href="/services/services-residents" title="" class="sf-depth-2 menuparent">Services for residents</a><ol><li id="menu-2010-1" class="first odd sf-item-1 sf-depth-3 sf-no-children"><a href="/services/services-residents/birth" title="" class="sf-depth-3">Birth</a></li><li id="menu-2016-1" class="middle even sf-item-2 sf-depth-3 sf-no-children"><a href="/services/services-residents/parenting-all" title="" class="sf-depth-3">Parenting</a></li><li id="menu-2177-1" class="middle odd sf-item-3 sf-depth-3 sf-no-children"><a href="/services/services-residents/health" title="" class="sf-depth-3">Health</a></li><li id="menu-2011-1" class="middle even sf-item-4 sf-depth-3 sf-no-children"><a href="/services/services-residents/social-benefits" title="" class="sf-depth-3">Social benefits</a></li><li id="menu-2017-1" class="middle odd sf-item-5 sf-depth-3 sf-no-children"><a href="/services/services-residents/education-and-training" title="" class="sf-depth-3">Education and training</a></li><li id="menu-2178-1" class="middle even sf-item-6 sf-depth-3 sf-no-children"><a href="/services/services-residents/relationship" title="" class="sf-depth-3">Relationships</a></li><li id="menu-2012-1" class="middle odd sf-item-7 sf-depth-3 sf-no-children"><a href="/services/services-residents/world-work-0" title="" class="sf-depth-3">World of work</a></li><li id="menu-2018-1" class="middle even sf-item-8 sf-depth-3 sf-no-children"><a href="/services/services-residents/place-live" title="" class="sf-depth-3">A place to live</a></li><li id="menu-2179-1" class="middle odd sf-item-9 sf-depth-3 sf-no-children"><a href="/services/services-residents/tv-postal-services" title="" class="sf-depth-3">TV and postal services</a></li><li id="menu-2013-1" class="middle even sf-item-10 sf-depth-3 sf-no-children"><a href="/services/services-residents/driving-0" title="" class="sf-depth-3">Driving</a></li><li id="menu-2019-1" class="middle odd sf-item-11 sf-depth-3 sf-no-children"><a href="/services/services-residents/travel-outside-sa" title="" class="sf-depth-3">Travel outside SA</a></li><li id="menu-2054-1" class="middle even sf-item-12 sf-depth-3 sf-no-children"><a href="/services/services-residents/citizenship-0" title="" class="sf-depth-3">Citizenship</a></li><li id="menu-2014-1" class="middle odd sf-item-13 sf-depth-3 sf-no-children"><a href="/services/services-residents/information-government" title="" class="sf-depth-3">Information from government</a></li><li id="menu-2180-1" class="middle even sf-item-14 sf-depth-3 sf-no-children"><a href="/services/services-residents/dealing-law-0" title="" class="sf-depth-3">Dealing with the law</a></li><li id="menu-2181-1" class="middle odd sf-item-15 sf-depth-3 sf-no-children"><a href="/services/services-residents/retirement-old-age" title="" class="sf-depth-3">Retirement and old age</a></li><li id="menu-2015-1" class="last even sf-item-16 sf-depth-3 sf-no-children"><a href="/services/services-residents/end-life" title="" class="sf-depth-3">End of life</a></li></ol></div></li><li id="menu-1987-1" class="middle even sf-item-2 sf-depth-2 sf-total-children-12 sf-parent-children-0 sf-single-children-12 sf-megamenu-column menuparent"><div class="sf-megamenu-column"><a href="/services/services-organisations" class="sf-depth-2 menuparent">Services for organisations</a><ol><li id="menu-2024-1" class="first odd sf-item-1 sf-depth-3 sf-no-children"><a href="/services/services-organisations/register-business-organisation" title="" class="sf-depth-3">Register business or organisation</a></li><li id="menu-2028-1" class="middle even sf-item-2 sf-depth-3 sf-no-children"><a href="/services/services-organisations/change-registration" title="" class="sf-depth-3">Change registration</a></li><li id="menu-2020-1" class="middle odd sf-item-3 sf-depth-3 sf-no-children"><a href="/services/services-organisations/business-incentives" title="" class="sf-depth-3">Business incentives</a></li><li id="menu-2021-1" class="middle even sf-item-4 sf-depth-3 sf-no-children"><a href="/services/services-organisations/deregister-business" title="" class="sf-depth-3">Deregister business</a></li><li id="menu-2025-1" class="middle odd sf-item-5 sf-depth-3 sf-no-children"><a href="/services/services-organisations/tax" title="" class="sf-depth-3">Tax</a></li><li id="menu-2029-1" class="middle even sf-item-6 sf-depth-3 sf-no-children"><a href="/services/services-organisations/intellectual-property" title="" class="sf-depth-3">Intellectual property</a></li><li id="menu-2022-1" class="middle odd sf-item-7 sf-depth-3 sf-no-children"><a href="/services/services-organisations/import" title="" class="sf-depth-3">Import</a></li><li id="menu-2026-1" class="middle even sf-item-8 sf-depth-3 sf-no-children"><a href="/services/services-organisations/export-permits" title="" class="sf-depth-3">Export permits</a></li><li id="menu-2030-1" class="middle odd sf-item-9 sf-depth-3 sf-no-children"><a href="/services/services-organisations/permits-licences-and-rights" title="" class="sf-depth-3">Permits licences and rights</a></li><li id="menu-2023-1" class="middle even sf-item-10 sf-depth-3 sf-no-children"><a href="/services/services-organisations/communication" title="" class="sf-depth-3">Communication</a></li><li id="menu-2027-1" class="middle odd sf-item-11 sf-depth-3 sf-no-children"><a href="/services/services-organisations/transport" title="" class="sf-depth-3">Transport</a></li><li id="menu-2183-1" class="last even sf-item-12 sf-depth-3 sf-no-children"><a href="/services/services-organisations/labour-0" title="" class="sf-depth-3">Labour</a></li></ol></div></li><li id="menu-1988-1" class="last odd sf-item-3 sf-depth-2 sf-total-children-3 sf-parent-children-0 sf-single-children-3 sf-megamenu-column menuparent"><div class="sf-megamenu-column"><a href="/services/services-foreign-nationals" title="" class="sf-depth-2 menuparent">Services for foreign nationals</a><ol><li id="menu-2031-1" class="first odd sf-item-1 sf-depth-3 sf-no-children"><a href="/services/services-foreigners/temporary-residence" title="" class="sf-depth-3">Temporary residence</a></li><li id="menu-2032-1" class="middle even sf-item-2 sf-depth-3 sf-no-children"><a href="/services/services-foreigners/permanent-residence" title="" class="sf-depth-3">Permanent residence</a></li><li id="menu-2033-1" class="last odd sf-item-3 sf-depth-3 sf-no-children"><a href="/services/services-foreigners/driving" title="" class="sf-depth-3">Driving</a></li></ol></div></li></ol></li></ul></li><li id="menu-1951-1" class="last odd sf-item-5 sf-depth-1 sf-total-children-3 sf-parent-children-2 sf-single-children-1 menuparent"><a href="/document/latest" title="Government Documents" class="sf-depth-1 menuparent">Documents</a><ul class="sf-megamenu"><li class="sf-megamenu-wrapper last odd sf-item-5 sf-depth-1 sf-total-children-3 sf-parent-children-2 sf-single-children-1 menuparent"><ol><li id="menu-2244-1" class="first odd sf-item-1 sf-depth-2 sf-no-children"><a href="/document/latest" title="" class="sf-depth-2">Latest/what&#039;s new</a></li><li id="menu-2133-1" class="middle even sf-item-2 sf-depth-2 sf-total-children-6 sf-parent-children-0 sf-single-children-6 sf-megamenu-column menuparent"><div class="sf-megamenu-column"><a href="/documents/tender" title="Government Tenders" class="sf-depth-2 menuparent">Tenders</a><ol><li id="menu-2138-1" class="first odd sf-item-1 sf-depth-3 sf-no-children"><a href="/documents/acts" title="" class="sf-depth-3">Acts</a></li><li id="menu-1952-1" class="middle even sf-item-2 sf-depth-3 sf-no-children"><a href="/documents/constitution/constitution-republic-south-africa-1996-1" title="Constitution" class="sf-depth-3">Constitution of SA</a></li><li id="menu-1997-1" class="middle odd sf-item-3 sf-depth-3 sf-no-children"><a href="/documents/bills" title="" class="sf-depth-3">Bills</a></li><li id="menu-1996-1" class="middle even sf-item-4 sf-depth-3 sf-no-children"><a href="/documents/draft-bills" title="" class="sf-depth-3">Draft bills</a></li><li id="menu-1994-1" class="middle odd sf-item-5 sf-depth-3 sf-no-children"><a href="/documents/notices" title="" class="sf-depth-3">Notices</a></li><li id="menu-3408-1" class="last even sf-item-6 sf-depth-3 sf-no-children"><a href="/documents/awarded-tenders" title="" class="sf-depth-3">Tenders awarded</a></li></ol></div></li><li id="menu-1954-1" class="last odd sf-item-3 sf-depth-2 sf-total-children-6 sf-parent-children-0 sf-single-children-6 sf-megamenu-column menuparent"><div class="sf-megamenu-column"><a href="/documents/white-papers" title="" class="sf-depth-2 menuparent">White papers</a><ol><li id="menu-1993-1" class="first odd sf-item-1 sf-depth-3 sf-no-children"><a href="/documents/green-papers" title="" class="sf-depth-3">Green papers</a></li><li id="menu-1995-1" class="middle even sf-item-2 sf-depth-3 sf-no-children"><a href="/documents/annual-report" title="" class="sf-depth-3">Annual reports</a></li><li id="menu-1992-1" class="middle odd sf-item-3 sf-depth-3 sf-no-children"><a href="/documents/other-documents" title="" class="sf-depth-3">Other documents</a></li><li id="menu-2135-1" class="middle even sf-item-4 sf-depth-3 sf-no-children"><a href="/documents/public-comment" title="" class="sf-depth-3">Documents for public comment</a></li><li id="menu-2134-1" class="middle odd sf-item-5 sf-depth-3 sf-no-children"><a href="http://beta2.statssa.gov.za/" title="" class="sf-depth-3">Statistical documents</a></li><li id="menu-2137-1" class="last even sf-item-6 sf-depth-3 sf-no-children"><a href="/documents/parliamentary-documents" class="sf-depth-3">Parliamentary documents</a></li></ol></div></li></ol></li></ul></li></ul>    </div></div></div><div class="block block-views block--exp-search-page block-views-exp-search-page even block-without-title" id="block-views-exp-search-page"><div class="block-inner clearfix"><div class="content clearfix"><form action="/search" method="get" id="views-exposed-form-search-page" accept-charset="UTF-8"><div><div class="views-exposed-form"><div class="views-exposed-widgets clearfix"><div id="edit-search-query-wrapper" class="views-exposed-widget views-widget-filter-search_api_views_fulltext"><label for="edit-search-query">Search          </label><div class="views-widget"><div class="form-item form-type-textfield form-item-search-query"><input type="text" id="edit-search-query" name="search_query" value="" size="30" maxlength="128" class="form-text" /></div></div><div class="description">Search the site          </div></div><div class="views-exposed-widget views-submit-button"><input type="submit" id="edit-submit-search" name="" value="Search" class="form-submit" />    </div></div></div></div></form>    </div></div></div>  </div></div></div></div></header><section id="section-content" class="section section-content"><div id="zone-content-wrapper" class="zone-wrapper zone-content-wrapper clearfix"><div id="zone-content" class="zone zone-content clearfix equal-height-container container-12"><div id="breadcrumb" class="grid-12"><h2 class="element-invisible">You are here</h2><div class="breadcrumb"><a href="/">Home</a></div></div><aside class="grid-3 region region-sidebar-first equal-height-element" id="region-sidebar-first"><div class="region-inner region-sidebar-first-inner"><div class="block block-views block--exp-speeches-views-page-5 block-views-exp-speeches-views-page-5 odd block-without-title" id="block-views-exp-speeches-views-page-5"><div class="block-inner clearfix"><div class="content clearfix"><form action="/newsroom" method="get" id="views-exposed-form-speeches-views-page-5" accept-charset="UTF-8"><div><div class="views-exposed-form"><div class="views-exposed-widgets clearfix"><div id="edit-title-field-value-wrapper" class="views-exposed-widget views-widget-filter-title_field_value"><label for="edit-title-field-value">Keyword          </label><div class="views-widget"><div class="form-item form-type-textfield form-item-title-field-value"><input type="text" id="edit-title-field-value" name="title_field_value" value="" size="30" maxlength="128" class="form-text form-autocomplete" /><input type="hidden" id="edit-title-field-value-autocomplete" value="https://www.gov.za/?q=autocomplete_filter/title_field_value/speeches_views/page_5/0" disabled="disabled" class="autocomplete" /></div></div></div><div id="edit-field-gcis-speech-category-tid-wrapper" class="views-exposed-widget views-widget-filter-field_gcis_speech_category_tid"><label for="edit-field-gcis-speech-category-tid">Categories          </label><div class="views-widget"><div class="form-item form-type-select form-item-field-gcis-speech-category-tid"><select id="edit-field-gcis-speech-category-tid" name="field_gcis_speech_category_tid" class="form-select"><option value="All" selected="selected">- Any -</option><option value="832"></option><option value="950">Media advisories</option><option value="336">Cabinet statements</option><option value="338">Statements</option><option value="337">Speeches</option><option value="340">Parliamentary questions and answers</option><option value="339">Transcripts</option><option value="834">Budget</option><option value="342">Events</option></select></div></div></div><div id="edit-field-gcis-speech-government-lvl-tid-wrapper" class="views-exposed-widget views-widget-filter-field_gcis_speech_government_lvl_tid"><label for="edit-field-gcis-speech-government-lvl-tid">Government Level          </label><div class="views-widget"><div class="form-item form-type-select form-item-field-gcis-speech-government-lvl-tid"><select id="edit-field-gcis-speech-government-lvl-tid" name="field_gcis_speech_government_lvl_tid" class="form-select"><option value="All" selected="selected">- Any -</option><option value="345">Local</option><option value="755">National</option><option value="344">Provincial</option><option value="833">Unspecified</option></select></div></div></div><div id="edit-field-gcis-speech-subjects-tid-1-wrapper" class="views-exposed-widget views-widget-filter-field_gcis_speech_subjects_tid_1"><label for="edit-field-gcis-speech-subjects-tid-1">Subjects          </label><div class="views-widget"><div class="form-item form-type-select form-item-field-gcis-speech-subjects-tid-1"><select multiple="multiple" name="field_gcis_speech_subjects_tid_1[]" id="edit-field-gcis-speech-subjects-tid-1" size="9" class="form-select"><option value="718">16 days of activism</option><option value="740">20 years of freedom</option><option value="715">Africa</option><option value="681">African Peer Review Mechanism (APRM)</option><option value="713">Agriculture</option><option value="703">Anti-corruption initiatives</option><option value="682">Arts and culture</option><option value="683">Aviation</option><option value="721">Black Economic Empowerment</option><option value="684">Budget: national</option><option value="680">Budget: provincial</option><option value="650">Business</option><option value="685">Cabinet statement</option><option value="722">Children&#039;s issues</option><option value="726">Climate change</option><option value="716">Cluster media briefings</option><option value="686">Communications</option><option value="687">Community Development Workers (CDW)</option><option value="688">Constitutional affairs</option><option value="689">Correctional services</option><option value="730">Cultural and traditional affairs</option><option value="690">Defence</option><option value="691">Development</option><option value="646">Disaster management</option><option value="706">Economy</option><option value="705">Education</option><option value="711">Elections</option><option value="692">Energy</option><option value="712">Energy efficiency</option><option value="693">Environment</option><option value="736">Equality</option><option value="744">Events</option><option value="649">Expanded Public Works Programme (EPWP)</option><option value="714">Fighting crime</option><option value="694">Finance</option><option value="697">Fishery</option><option value="739">Food security</option><option value="696">Forestry</option><option value="643">Fraud and corruption</option><option value="698">Freedom day</option><option value="742">Freedom Month</option><option value="725">Governance</option><option value="699">Government services</option><option value="700">Growth and development</option><option value="707">Health</option><option value="701">History</option><option value="710">HIV and AIDS</option><option value="702">Home affairs</option><option value="651">Housing</option><option value="652">Human and social issues</option><option value="653">Human rights</option><option value="654">Human trafficking</option><option value="735">Imbizo</option><option value="727">Immigration</option><option value="743">Infrastructure</option><option value="695">International relations</option><option value="728">Job creation</option><option value="729">Justice</option><option value="655">Labour</option><option value="656">Land</option><option value="657">Legal issues</option><option value="658">Local government</option><option value="738">Mandela Month</option><option value="732">Media relations</option><option value="745">Military veterans</option><option value="659">Mining and minerals</option><option value="737">National Development Plan</option><option value="746">Nelson Mandela</option><option value="660">Nepad</option><option value="733">People with disabilities</option><option value="661">Presidential pardons</option><option value="720">Provincial executive council</option><option value="662">Provincial government</option><option value="663">Public enterprises</option><option value="664">Public service</option><option value="665">Public works</option><option value="731">Road safety</option><option value="644">Rural development</option><option value="666">SADC</option><option value="667">Safety and security</option><option value="668">Science and technology</option><option value="723">Service delivery</option><option value="704">Skills development</option><option value="669">Social development</option><option value="670">Sport and recreation</option><option value="741">State of the City Address</option><option value="717">State of the Nation address</option><option value="647">State of the Province Address</option><option value="671">Statistics</option><option value="648">Tax</option><option value="672">Telecommunications</option><option value="673">Tourism</option><option value="724">Trade and industry</option><option value="674">Traditional affairs</option><option value="708">Transport</option><option value="679">Un-categorised items</option><option value="675">Water</option><option value="676">Women&#039;s issues</option><option value="677">Xenophobia</option><option value="678">Youth affairs</option></select></div></div></div><div id="edit-field-gcis-speech-date-value-1-wrapper" class="views-exposed-widget views-widget-filter-field_gcis_speech_date_value_1"><div class="views-widget"><div id="edit-field-gcis-speech-date-value-min-wrapper"><div id="edit-field-gcis-speech-date-value-min-inside-wrapper"><div  class="container-inline-date"><div class="form-item form-type-date-popup form-item-field-gcis-speech-date-value-1-min"><label for="edit-field-gcis-speech-date-value-1-min">Start date </label><div id="edit-field-gcis-speech-date-value-1-min"  class="date-padding"><div class="form-item form-type-textfield form-item-field-gcis-speech-date-value-1-min-date"><label class="element-invisible" for="edit-field-gcis-speech-date-value-1-min-datepicker-popup-1">Date </label><input type="text" id="edit-field-gcis-speech-date-value-1-min-datepicker-popup-1" name="field_gcis_speech_date_value_1[min][date]" value="" size="20" maxlength="30" class="form-text" /><div class="description"> E.g., 17 Jul 2021</div></div></div></div></div></div></div><div id="edit-field-gcis-speech-date-value-max-wrapper"><div id="edit-field-gcis-speech-date-value-max-inside-wrapper"><div  class="container-inline-date"><div class="form-item form-type-date-popup form-item-field-gcis-speech-date-value-1-max"><label for="edit-field-gcis-speech-date-value-1-max">End date </label><div id="edit-field-gcis-speech-date-value-1-max"  class="date-padding"><div class="form-item form-type-textfield form-item-field-gcis-speech-date-value-1-max-date"><label class="element-invisible" for="edit-field-gcis-speech-date-value-1-max-datepicker-popup-1">Date </label><input type="text" id="edit-field-gcis-speech-date-value-1-max-datepicker-popup-1" name="field_gcis_speech_date_value_1[max][date]" value="" size="20" maxlength="30" class="form-text" /><div class="description"> E.g., 17 Jul 2021</div></div></div></div></div></div></div>        </div></div><div class="views-exposed-widget views-submit-button"><input type="submit" id="edit-submit-speeches-views" name="" value="Search" class="form-submit" />    </div><div class="views-exposed-widget views-reset-button"><input type="submit" id="edit-reset" name="op" value="Reset" class="form-submit" />      </div></div></div></div></form>    </div></div></div><section class="block block-block block-26 block-block-26 even" id="block-block-26"><div class="block-inner clearfix"><h2 class="block-title">Related Links</h2><div class="content clearfix"><p><a href="events">Events</a></p><p><a href="/state-nation-address">State of the Nation address</a></p><p><a href="national-budget-0">Budget speeches</a></p><p><a href="/node/733867/">Audio files</a></p></div></div></section>  </div></aside><div class="grid-9 region region-content equal-height-element" id="region-content"><div class="region-inner region-content-inner"><a id="main-content"></a><h1 class="title" id="page-title">Newsroom</h1><div class="block block-system block-main block-system-main odd block-without-title" id="block-system-main"><div class="block-inner clearfix"><div class="content clearfix"><div class="view view-speeches-views view-id-speeches_views view-display-id-page_5 view-dom-id-db6e85de7b536c7602292a7c4e857ef9"><div class="view-header"><p>You can use the filters to show only results that match your interests</p></div><div class="view-content"><table  class="views-table cols-2" class="views-table cols-2"><thead><tr><th  class="views-field views-field-title-field" scope="col"><a href="/newsroom?title_field_value=&amp;field_gcis_speech_category_tid=All&amp;field_gcis_speech_government_lvl_tid=All&amp;&amp;field_gcis_speech_date_value_1%5Bmin%5D&amp;field_gcis_speech_date_value_1%5Bmax%5D&amp;order=title_field&amp;sort=asc" title="sort by Title" class="active">Title</a>          </th><th  class="views-field views-field-field-gcis-speech-date" scope="col"><a href="/newsroom?title_field_value=&amp;field_gcis_speech_category_tid=All&amp;field_gcis_speech_government_lvl_tid=All&amp;&amp;field_gcis_speech_date_value_1%5Bmin%5D&amp;field_gcis_speech_date_value_1%5Bmax%5D&amp;order=field_gcis_speech_date&amp;sort=asc" title="sort by Date" class="active">Date</a>          </th></tr></thead><tbody><tr  class="odd views-row-first"><td  class="views-field views-field-title-field"><a href="/speeches/president-cyril-ramaphosa-conducts-oversight-visit-kwazulu-natal-16-jul-16-jul-2021-0000">President Cyril Ramaphosa conducts oversight visit to Kwazulu-Natal, 16 Jul</a>          </td><td  class="views-field views-field-field-gcis-speech-date"><span class="date-display-single" property="dc:date" datatype="xsd:dateTime" content="2021-07-16T00:00:00+02:00">16 Jul 2021</span>          </td></tr><tr  class="even"><td  class="views-field views-field-title-field"><a href="/speeches/western-cape-government-updates-ongoing-public-unrest-and-taxi-violence-15-jul-2021-0000">Western Cape Government updates on ongoing public unrest and taxi violence</a>          </td><td  class="views-field views-field-field-gcis-speech-date"><span class="date-display-single" property="dc:date" datatype="xsd:dateTime" content="2021-07-15T00:00:00+02:00">15 Jul 2021</span>          </td></tr><tr  class="odd"><td  class="views-field views-field-title-field"><a href="/speeches/premier-alan-winde-coronavirus-covid-19-and-vaccines-western-cape-15-jul-2021-0000">Premier Alan Winde on Coronavirus COVID-19 and vaccines in the Western Cape</a>          </td><td  class="views-field views-field-field-gcis-speech-date"><span class="date-display-single" property="dc:date" datatype="xsd:dateTime" content="2021-07-15T00:00:00+02:00">15 Jul 2021</span>          </td></tr><tr  class="even"><td  class="views-field views-field-title-field"><a href="/speeches/minister-khumbudzo-ntshavheni-update-violent-protests-some-parts-south-africa-15-jul-2021">Minister Khumbudzo Ntshavheni: Update on violent protests in some parts of South Africa</a>          </td><td  class="views-field views-field-field-gcis-speech-date"><span class="date-display-single" property="dc:date" datatype="xsd:dateTime" content="2021-07-15T00:00:00+02:00">15 Jul 2021</span>          </td></tr><tr  class="odd"><td  class="views-field views-field-title-field"><a href="/speeches/minister-thoko-didiza-meets-agricultural-sector-stakeholders-15-jul-2021-0000">Minister Thoko Didiza meets agricultural sector stakeholders</a>          </td><td  class="views-field views-field-field-gcis-speech-date"><span class="date-display-single" property="dc:date" datatype="xsd:dateTime" content="2021-07-15T00:00:00+02:00">15 Jul 2021</span>          </td></tr><tr  class="even"><td  class="views-field views-field-title-field"><a href="/speeches/deputy-minister-njabulo-nzuza-visits-vandalised-home-affairs-office-eshowe-16-jul-15-jul">Deputy Minister Njabulo Nzuza visits vandalised Home Affairs office in Eshowe, 16 Jul </a>          </td><td  class="views-field views-field-field-gcis-speech-date"><span class="date-display-single" property="dc:date" datatype="xsd:dateTime" content="2021-07-15T00:0	

 

1 ACCEPTED SOLUTION

Accepted Solutions
jimbarbour
Meteorite | Level 14

Your code seems to be basically working.  I did make one slight change to the basic code.  Since your Filename was "Extract," I changed the file name in the Infile in the Data step to "Extract" from "Source"

filename extract "/Location/source.txt";

infile source Extract length=len lrecl=32767;

 

However, instead of using FIND, I would use COUNT.  The COUNT function will give you the number of occurrences of a sub-string inside of a string.

Count_str = COUNT(line,'<a href="/speeches/');

 

I would then add a DO loop that iteratively substrings through the line to parse out each of the URL's of the speeches. 

Start_Pos = 1;
DO i = 1 TO Count_Str;
    Str_To_Search = SUBSTR(line, Start_Pos);
    Str_Pos = FIND(Str_to_Search,'<a href="/speeches/');
    links = compress(scan(substr(Str_to_search,Str_Pos+9),1,'>'),'"');
    Link_Len = LENGTHN(STRIP(Links));
    full_link = cats('https://www.gov.za',links);
    OUTPUT;
    Start_Pos = Start_Pos + Str_Pos + Link_Len;
END;

 

The URL's that are extracted are shown below, and, when I open a browser to a random URL, it shows one of the speeches. 

https://www.gov.za/speeches/president-cyril-ramaphosa-update-security-situation-country-16-jul-2021-0000
https://www.gov.za/speeches/president-cyril-ramaphosa%C2%A0addresses-nation-security-situation-country-16-jul-16-jul-2021
https://www.gov.za/speeches/mineral-resources-and-energy-gives-clarification-regulations-prohibiting-retail-sales
https://www.gov.za/speeches/acting-minister-khumbudzo-ntshavheni-update-security-situation-prevailing-country-16-july
https://www.gov.za/speeches/mec-jacob-mamabolo-visits-sophiatown-and-westbury-communities-17-jul-16-jul-2021-0000
https://www.gov.za/speeches/acting-minister-presidency-khumbudzo-ntshavheni-briefs-media-ongoing-violent-protests-0

Jim

 

Full program is below.  There's a bit of extra code that I added for debugging purposes.

%LET	Width	=	22;

filename extract "C:\Users\jbarbour\Documents\SAS\Pgm\Training\HTTP\source.txt";

proc http
	method="GET"
	out=extract
	url="https://www.gov.za/newsroom";
run;

data work.report;
	keep 	full_link;
	LENGTH	Links		$32767;
	LENGTH	Full_Link	$32767;

	infile Extract length=len lrecl=32767	end=end_of_file;

	if	end_of_file	then
		DO;
			PUTLOG	"NOTE-  ";
			PUTLOG	"NOTE-  %Format_Dashes(&Width)";
			PUTLOG	"NOTE:  | "   Non_Blank_lines=		COMMA17.;
			PUTLOG	"NOTE-  | "   Blank_lines=			COMMA17.;
			PUTLOG	"NOTE-  | "   Speeches_Found=		COMMA17.;
			PUTLOG	"NOTE-  | "   No_Speech_Found=		COMMA17.;
			PUTLOG	"NOTE-  %Format_Dashes(&Width)";
			PUTLOG	"NOTE-  ";
		END;

	input line $varying32767. len;

	line 					= 	strip(line);
	if len					>	0	then
		DO;
			Non_blank_Lines	+	1;
		END;
	else
		DO;
			blank_Lines		+	1;
			DELETE;
		END;

	Count_str 				= 	COUNT(line,'<a href="/speeches/');
	if Count_Str 			>	0	then
		DO;
			Speeches_Found	+	1;
		END;
	else
		DO;
			No_Speech_Found	+	1;
			DELETE;
		END;

	Start_Pos				=	1;
	DO	i					=	1	TO	Count_Str;
		Str_To_Search		=	SUBSTR(line, Start_Pos);
		Str_Pos				=	FIND(Str_to_Search,'<a href="/speeches/');
		links				=	compress(scan(substr(Str_to_search,Str_Pos+9),1,'>'),'"');
		Link_Len			=	LENGTHN(STRIP(Links));
		full_link			=	cats('https://www.gov.za',links);
		OUTPUT;
		Start_Pos			=	Start_Pos	+	Str_Pos	+	Link_Len;
	END;

	DELETE;
run;

View solution in original post

3 REPLIES 3
jimbarbour
Meteorite | Level 14

Your code seems to be basically working.  I did make one slight change to the basic code.  Since your Filename was "Extract," I changed the file name in the Infile in the Data step to "Extract" from "Source"

filename extract "/Location/source.txt";

infile source Extract length=len lrecl=32767;

 

However, instead of using FIND, I would use COUNT.  The COUNT function will give you the number of occurrences of a sub-string inside of a string.

Count_str = COUNT(line,'<a href="/speeches/');

 

I would then add a DO loop that iteratively substrings through the line to parse out each of the URL's of the speeches. 

Start_Pos = 1;
DO i = 1 TO Count_Str;
    Str_To_Search = SUBSTR(line, Start_Pos);
    Str_Pos = FIND(Str_to_Search,'<a href="/speeches/');
    links = compress(scan(substr(Str_to_search,Str_Pos+9),1,'>'),'"');
    Link_Len = LENGTHN(STRIP(Links));
    full_link = cats('https://www.gov.za',links);
    OUTPUT;
    Start_Pos = Start_Pos + Str_Pos + Link_Len;
END;

 

The URL's that are extracted are shown below, and, when I open a browser to a random URL, it shows one of the speeches. 

https://www.gov.za/speeches/president-cyril-ramaphosa-update-security-situation-country-16-jul-2021-0000
https://www.gov.za/speeches/president-cyril-ramaphosa%C2%A0addresses-nation-security-situation-country-16-jul-16-jul-2021
https://www.gov.za/speeches/mineral-resources-and-energy-gives-clarification-regulations-prohibiting-retail-sales
https://www.gov.za/speeches/acting-minister-khumbudzo-ntshavheni-update-security-situation-prevailing-country-16-july
https://www.gov.za/speeches/mec-jacob-mamabolo-visits-sophiatown-and-westbury-communities-17-jul-16-jul-2021-0000
https://www.gov.za/speeches/acting-minister-presidency-khumbudzo-ntshavheni-briefs-media-ongoing-violent-protests-0

Jim

 

Full program is below.  There's a bit of extra code that I added for debugging purposes.

%LET	Width	=	22;

filename extract "C:\Users\jbarbour\Documents\SAS\Pgm\Training\HTTP\source.txt";

proc http
	method="GET"
	out=extract
	url="https://www.gov.za/newsroom";
run;

data work.report;
	keep 	full_link;
	LENGTH	Links		$32767;
	LENGTH	Full_Link	$32767;

	infile Extract length=len lrecl=32767	end=end_of_file;

	if	end_of_file	then
		DO;
			PUTLOG	"NOTE-  ";
			PUTLOG	"NOTE-  %Format_Dashes(&Width)";
			PUTLOG	"NOTE:  | "   Non_Blank_lines=		COMMA17.;
			PUTLOG	"NOTE-  | "   Blank_lines=			COMMA17.;
			PUTLOG	"NOTE-  | "   Speeches_Found=		COMMA17.;
			PUTLOG	"NOTE-  | "   No_Speech_Found=		COMMA17.;
			PUTLOG	"NOTE-  %Format_Dashes(&Width)";
			PUTLOG	"NOTE-  ";
		END;

	input line $varying32767. len;

	line 					= 	strip(line);
	if len					>	0	then
		DO;
			Non_blank_Lines	+	1;
		END;
	else
		DO;
			blank_Lines		+	1;
			DELETE;
		END;

	Count_str 				= 	COUNT(line,'<a href="/speeches/');
	if Count_Str 			>	0	then
		DO;
			Speeches_Found	+	1;
		END;
	else
		DO;
			No_Speech_Found	+	1;
			DELETE;
		END;

	Start_Pos				=	1;
	DO	i					=	1	TO	Count_Str;
		Str_To_Search		=	SUBSTR(line, Start_Pos);
		Str_Pos				=	FIND(Str_to_Search,'<a href="/speeches/');
		links				=	compress(scan(substr(Str_to_search,Str_Pos+9),1,'>'),'"');
		Link_Len			=	LENGTHN(STRIP(Links));
		full_link			=	cats('https://www.gov.za',links);
		OUTPUT;
		Start_Pos			=	Start_Pos	+	Str_Pos	+	Link_Len;
	END;

	DELETE;
run;

kaziumair
Quartz | Level 8
Hi, thank you for your help.
jimbarbour
Meteorite | Level 14

You're welcome.  Thanks for posting an interesting application. I haven't had occasion to use Proc HTTP; this is a nice example.  

 

Jim

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 2338 views
  • 3 likes
  • 2 in conversation