BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
dcortell
Pyrite | Level 9

Hi experts. I'm using the following example:

filename src temp;
proc http
url="https://www.sas.com/en/whitepapers/artificial-intelligence-banking-risk-management-110277.br.html#formsuccess?utm_source=linkedin&utm_medium=paid-social&utm_campaign=rsk-gen-emea&utm_content=50931-lklgf-english"
out=src;
run;

The scraping produce the following note: 

 

73 filename src temp;
74 proc http
75 method="get"
76 url="https://www.sas.com/en/whitepapers/artificial-intelligence-banking-risk-management-110277.br.html#formsuccess?utm_so
76 ! urce=linkedin&utm_medium=paid-social&utm_campaign=rsk-gen-emea&utm_content=50931-lklgf-english"
WARNING: Apparent symbolic reference UTM_MEDIUM not resolved.
WARNING: Apparent symbolic reference UTM_CAMPAIGN not resolved.
WARNING: Apparent symbolic reference UTM_CONTENT not resolved.
77 out=src;
78 run;

NOTE: 404 Not Found
NOTE: PROCEDURE HTTP ha utilizado (Tiempo de proceso total):
real time 0.11 seconds
cpu time
However, the URL is reachable and it has an HTML structure that should be possible to scrape
 
Any idea about why the scraping is failing here?

 

1 ACCEPTED SOLUTION

Accepted Solutions
dcortell
Pyrite | Level 9

In addition to Chris test,   If I cut the url at the ".html" part and provide it, the scrape works fine:

 

%let url='https://www.sas.com/en/whitepapers/artificial-intelligence-banking-risk-management-110277.br.html';

filename src temp;
proc http
url=&url
out=src;
run;

data _null_;
infile src;
input;
list;
run;

 

I believe then it could make sense just to make similar URLs being trimmed at the ".HTML" part, if that allow the scraping, but still not sure why for some folks the full URL scrape works fine while on mine no

View solution in original post

9 REPLIES 9
LinusH
Tourmaline | Level 20

Try to use single qoutes for the url option, unless you are actually trying to use SAS macro variables

Data never sleeps
Tom
Super User Tom
Super User

I suspect that the webserver that SAS is using to host that page is not playing well with PROC HTTP.

 

Since it is a SAS site why not open a SAS Support ticket and let them debug their own site.

LinusH
Tourmaline | Level 20

If works for me from SAS Viya Learing Ed:

82   filename src temp;
83   proc http
84   url='https://www.sas.com/en/whitepapers/artificial-intelligence-banking-risk-management-110277.br.html#formsuccess?utm_source=l
84 ! inkedin&utm_medium=paid-social&utm_campaign=rsk-gen-emea&utm_content=50931-lklgf-english'
85   out=src;
86   run;
NOTE: PROCEDURE HTTP used (Total process time):
      real time           0.42 seconds
      cpu time            0.03 seconds
Data never sleeps
ChrisHemedinger
Community Manager

I suspect your environment is working through a proxy server and you might need to specify PROXY= options.

 

filename resp temp;
 
proc http
 url="https://www.sas.com/en/whitepapers/artificial-intelligence-banking-risk-management-110277.br.html"
 out=resp;
run;

data _null_;
 infile resp;
 input ;
 put _infile_ ;
run;

 

But what you trying to do here? This page is a gate to a whitepaper, but it won't get you the paper itself -- that's a download process with a PDF.

Check out SAS Innovate on-demand content! Watch the main stage sessions, keynotes, and over 20 technical breakout sessions!
dcortell
Pyrite | Level 9

Hi Chris, testing proc http under different conditions

ChrisHemedinger
Community Manager

Here are some tips for checking that PROC HTTP is usable with your internet access in your SAS session.

Check out SAS Innovate on-demand content! Watch the main stage sessions, keynotes, and over 20 technical breakout sessions!
dcortell
Pyrite | Level 9

Running the test code getting the following error:

 

92 /* Tell SAS to parse the JSON response */
93 libname stream JSON fileref=resp;
NOTE: JSON data is only read once. To read the JSON again, reassign the JSON LIBNAME.
ERROR: JSON no válido en input cerca de la línea 1 columna 1: Encountered an illegal character.
ERROR: Error in the LIBNAME statement.
94
95 title "JSON library structure";
96 proc datasets lib=stream;
ERROR: Libref STREAM is not assigned.
97 quit;
dcortell
Pyrite | Level 9

EDIT: Runned for a second iteration, the code works fine and no error is generated

dcortell
Pyrite | Level 9

In addition to Chris test,   If I cut the url at the ".html" part and provide it, the scrape works fine:

 

%let url='https://www.sas.com/en/whitepapers/artificial-intelligence-banking-risk-management-110277.br.html';

filename src temp;
proc http
url=&url
out=src;
run;

data _null_;
infile src;
input;
list;
run;

 

I believe then it could make sense just to make similar URLs being trimmed at the ".HTML" part, if that allow the scraping, but still not sure why for some folks the full URL scrape works fine while on mine no

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1127 views
  • 2 likes
  • 4 in conversation