DATA Step, Macro, Functions and more

web scraping LinkedIn site

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 10
Accepted Solution

web scraping LinkedIn site

Hi everyone, 

 

I am trying to extract data from my own LinkedIn page, using SAS 9.4.

 

 I've seen the various posts on this community and internet, for instance 

https://blogs.sas.com/content/sasdummy/2017/12/04/scrape-web-page-data/

 

 

I've used the PWENCODE procedure to encode my password into a txt file. I get it back in the macro variable &PASS.

So far, I've written the code :

filename recupFIC "C:\PASS to output file\test.xml";
proc http
	method="GET"
	url="https://www.linkedin.com/company/MYSITENUMBER/admin/analytics"
	out=recupFIC
	WEBAUTHDOMAIN="www.linkedin.com"
	webusername="my username here"
	webpassword="&PASS."
	;
run;

When I run this code, a window opens and I am asked to fill the fields for a metadata server : Server Name, User Id and password. I don't understand where I can find these informations.

 

When I try to scrap a simple page (no authentification required), it works perfectly.

 

Do you have any idea? 

 

Thank you very much in advance,

 


Accepted Solutions
Solution
‎03-26-2018 01:52 PM
Community Manager
Posts: 3,462

Re: web scraping LinkedIn site

Posted in reply to SophieSaas

WEBAUTHDOMAIN is for an administered SAS mid-tier, so that's not an option you need.

 

WEBUSERNAME and WEBPASSWORD is for "Basic Auth" -- but LinkedIn does not use that mechanism.  3rd party applications must use LinkedIn APIs and connect with OAuth2 -- a much more complex negotiation.  And I'm not sure that LinkedIn APIs provide the data you want to get. Check their Developer site to see what's possible.

 

Web scraping is most likely against LinkedIn's data use policy.  While you might be just trying to experiment with your own profile, taking it further is probably against their rules.  If you just want to "practice" parsing your page, use your web browser to Save As HTML and then use SAS to process that as an INFILE.

 

Chris

View solution in original post


All Replies
Solution
‎03-26-2018 01:52 PM
Community Manager
Posts: 3,462

Re: web scraping LinkedIn site

Posted in reply to SophieSaas

WEBAUTHDOMAIN is for an administered SAS mid-tier, so that's not an option you need.

 

WEBUSERNAME and WEBPASSWORD is for "Basic Auth" -- but LinkedIn does not use that mechanism.  3rd party applications must use LinkedIn APIs and connect with OAuth2 -- a much more complex negotiation.  And I'm not sure that LinkedIn APIs provide the data you want to get. Check their Developer site to see what's possible.

 

Web scraping is most likely against LinkedIn's data use policy.  While you might be just trying to experiment with your own profile, taking it further is probably against their rules.  If you just want to "practice" parsing your page, use your web browser to Save As HTML and then use SAS to process that as an INFILE.

 

Chris

Occasional Contributor
Posts: 10

Re: web scraping LinkedIn site

Posted in reply to SophieSaas
Thank you Chris for your answer. As I was "just" trying to scrap my own page I hadn't realized it could be an issue for LinkedIn.
Thank you!
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 2 replies
  • 154 views
  • 0 likes
  • 2 in conversation