.I have problem to read the url,please help.
Thanks!
.
FILENAME SOURCE URL "%STR(https://www.ted.com/talks/quick-list)" DEBUG; DATA SOURCE2; FORMAT WEBPAGE $1000.; INFILE SOURCE LRECL=32767 DELIMITER=">"; INPUT WEBPAGE $ @@; RUN;
What problem do you have ?
Please post your log.
Your code reads the url just fine, but it doesn't do anything to parse it. e.g., if you wanted the dates and titles of the presentations, the following works for all but one of the presentations. The one it misses contains a non-ASCII character:
FILENAME SOURCE URL "%STR(https://www.ted.com/talks/quick-list)" DEBUG; DATA SOURCE2 (drop=junk); informat junk $80.; informat published $8.; informat title $100.; INFILE SOURCE LRECL=32767 DELIMITER=">"; INPUT @"<spam class='meta'" published &/////junk title &; title=tranwrd(scan(title,1,'<'),''',"'"); RUN;
Art, CEO, AnalystFinder.com
Could it be that the site doesn't like SAS/URL as a user agent and responds with a 404?
A major difference between my log and your log:
Yours: GET /talks/quick-list HTTP/1.0
Mine: GET /talks/quick-list HTTP/1.1
Is it possible that you're behind a firewall that doesn't support http/1.1 requests?
Art, CEO, AnalystFinder.com
If I change URL to https://www.ted.com/talks it will be ok. But that is not the url I wanted.
Hi @GeorgeSAS
The code @art297 posted works for me as well. See below:
NOTE: >>> GET /talks/quick-list HTTP/1.0 NOTE: >>> Host: www.ted.com:443 NOTE: >>> Accept: */* NOTE: >>> Accept-Language: en NOTE: >>> Accept-Charset: iso-8859-1,*,utf-8 NOTE: >>> User-Agent: SAS/URL NOTE: >>> NOTE: <<< HTTP/1.1 200 OK NOTE: <<< Age: 0 NOTE: <<< Cache-Control: max-age=0, public, s-maxage=30 NOTE: <<< Content-Security-Policy-Report-Only: script-src 'unsafe-inline' 'unsafe-eval' https:; style-src 'unsafe-inline' 'self' https:; default-src 'self' https: data: blob:; report-uri https://error-collector.ted.com/?context=csp-report NOTE: <<< Content-Type: text/html; charset=utf-8 NOTE: <<< Date: Sun, 07 May 2017 00:29:18 GMT NOTE: <<< ETag: W/"bc10c663abfefcaba0aa45f32b8efbe2" NOTE: <<< Server: nginx NOTE: <<< Set-Cookie: _nu=1494116958.827; Expires=Fri, 06 May 2022 00:29:18 GMT; Path=/ NOTE: <<< Set-Cookie: _abby=7OCod3AeLqPqydy; Expires=Fri, 06 May 2022 00:29:18 GMT; Path=/; Domain=.ted.com NOTE: <<< Status: 200 OK NOTE: <<< X-Content-Type-Options: nosniff 2 The SAS System 10:29 Sunday, May 7, 2017 NOTE: <<< X-Served-By: e11; o11 NOTE: <<< X-XSS-Protection: 1; mode=block NOTE: <<< Connection: Close NOTE: <<< NOTE: The infile SOURCE is: Filename=https://www.ted.com/talks/quick-list, Local Host Name=<user name>, Local Host IP addr=<ip address>, Service Hostname Name=www.ted.com, Service IP addr=<ip address>,Service Name=N/A, Service Portno=443,Lrecl=32767,Recfm=Variable NOTE: 1847 records were read from the infile SOURCE.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.