Text mining and content categorization

web crawler

Reply
Learner
Posts: 1

web crawler

Hi! I am trying to crawl the web using sas text mining from enterprise miner (14.1). Under  properties for the text import node, the web crawler section where i enter the url is grayed out. I can't for the life of me figure out why! Any ideas???

Frequent Contributor
Posts: 130

Re: web crawler

Are you using the SAS-cloud based On Demand for Academics Enterprise MIner, or a local SAS Enterprise Miner without the add-on text miner licence? Either of those would explain it. You need a local or enterprise SAS Enterprise Miner installation with the add-on text miner licenced to support the url web crawling method of text importing. In my university some staff and phd students have those types of SAS Enterprise Miner installations and when we want to teach text mining using web crawl text importing we use one of those installations to import the text, export it to a single csv file and upload it to SAS OnDemand cloud course folders using SAS On Demand Studio. If you are a student or academic and don't have a local or enterprise installation with hte text monier add on licence, maybe one of the community who does could crawl your url specification for you and send you the exported text file. Be prepared for some transcoding problems when you do that. The local and enterprise encoding is usually windows wlatin, and the OnDemand and url source encoding is typically utf-8 so somtimes fields have to be dropped as the transcoding hangs. 

Ask a Question
Discussion stats
  • 1 reply
  • 628 views
  • 1 like
  • 2 in conversation