BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
kevin12
Fluorite | Level 6

Good afternoon, I am task to write a SAS program to search a website and the links within the website for key words and phases. I wrote the program but it is only searching the webpage and not the entire website. I would also want it to spit out the link that they find the key words. I have gotten this:

 

filename LAW url "http://delcode.delaware.gov/sessionlaws/ga148/index.shtml";

data LAW_achive;

*length chapter $200;

infile LAW length=len lrecl=32767;

input line $varying32767. len;

uline= upcase(line);

if find(uline,"chapter") and find(uline,"LAW") or find(uline,"SCHOOL")or find(uline,"PUBLIC")

or find(uline,"BOARD")or find(uline,"EDUCATION")or find(uline,"AUTHORITY")or find(uline,"MEMBER")

then do;

chapter=scan(line,2,'"');

output;

end;

run;

 

Is it possible to help me cleaning up my program and help me with the website search part. Please look at the link to see the why it woulf be beneficial to search not only the webpage but the entire website making the first the starting point.

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

HoHo. That is really not easy. Firstly get those URL ,after that call execute them.

 

filename LAW url "http://delcode.delaware.gov/sessionlaws/ga148/index.shtml";
data have(keep=xx) url(keep=url);
infile law length=len lrecl=32767;
input  x $varying32767. len;
retain flag;
if x =: '<body>' then flag=1;
if flag then do;
 xx=prxchange('s/\<[^\<\>]+\>//',-1,x);
 if not prxmatch('/^\s+$/',xx) then output have;
 if prxmatch('/CHAPTER\s+\d+/i',x) then do;
  temp=scan(x,2,'"');
  url=cats('http://delcode.delaware.gov/sessionlaws/ga148/',temp);
  if not missing(temp) then output url;
 end;
end;
run;


%macro chapter(url=);
filename LAW url "&url";
data %scan(&url,-2,%str(/.))(keep=xx) ;
infile law length=len lrecl=32767;
input  x $varying32767. len;
retain flag;
if x =: '<body>' then flag=1;
if flag then do;
 xx=prxchange('s/\<[^\<\>]+\>//',-1,x);
 if not prxmatch('/^\s+$/',xx) then output;
end;
run;
%mend;

data _null_;
 set url;
 call execute(cats('%chapter(url=',url,')'));
run;

View solution in original post

2 REPLIES 2
Ksharp
Super User

HoHo. That is really not easy. Firstly get those URL ,after that call execute them.

 

filename LAW url "http://delcode.delaware.gov/sessionlaws/ga148/index.shtml";
data have(keep=xx) url(keep=url);
infile law length=len lrecl=32767;
input  x $varying32767. len;
retain flag;
if x =: '<body>' then flag=1;
if flag then do;
 xx=prxchange('s/\<[^\<\>]+\>//',-1,x);
 if not prxmatch('/^\s+$/',xx) then output have;
 if prxmatch('/CHAPTER\s+\d+/i',x) then do;
  temp=scan(x,2,'"');
  url=cats('http://delcode.delaware.gov/sessionlaws/ga148/',temp);
  if not missing(temp) then output url;
 end;
end;
run;


%macro chapter(url=);
filename LAW url "&url";
data %scan(&url,-2,%str(/.))(keep=xx) ;
infile law length=len lrecl=32767;
input  x $varying32767. len;
retain flag;
if x =: '<body>' then flag=1;
if flag then do;
 xx=prxchange('s/\<[^\<\>]+\>//',-1,x);
 if not prxmatch('/^\s+$/',xx) then output;
end;
run;
%mend;

data _null_;
 set url;
 call execute(cats('%chapter(url=',url,')'));
run;
Ksharp
Super User

I just realize you need add the following into code  to avoid truncate problem.

 

length xx url $ 20000;

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 1128 views
  • 2 likes
  • 2 in conversation