Text mining and content categorization

Sentiment Analysis Workbench Corpus format

Accepted Solution Solved
Reply
Contributor
Posts: 71
Accepted Solution

Sentiment Analysis Workbench Corpus format

I've installed the various Sentiment Analysis tools (studio, server and workbench).  I've already created my training corpus and created a Statistical Model in studio.  I've uploaded the model to the server. 

I am now creating a new project in Workbench.  There is a tab where I specify my corpus and upload it.  The upload fails every time with the error "Unable to upload file". 

The file I am uploading is a zipped folder of text files. Here are my guesses as to what may be happening:

1) the file is being uploaded to a folder which I (i.e. the web server or workbench user) may not have permissions to access.  But what folder would that be?

2) perhaps the folder is not uploaded, but the contents read and placed into the MySQL database? 

3) the file format is incorrect.  I also tried zipping only the text documents.  That did not work.  Perhaps the formats of the files themselves are not acceptable.

I have no clue how to proceed.  Any suggestions are appreciated.


Accepted Solutions
Solution
‎08-19-2013 06:30 PM
Contributor
Posts: 71

Re: Sentiment Analysis Workbench Corpus format

I just solved my own question just now.  Will it count to mark this as the right answer?

I went into the directory where SAS SA Workbench is installed.  There is a "test_documents" folder with an example corpus.  It looks like the corpus needs to be a zipped folder of XML files.  Each document has the following format:

<doc>

<docid><![CDATA[filename .xml without extension]]></docid>

<title><![CDATA[subject title here]]></title>

<createtime><![CDATA[10/6/2008 10:00:00 AM]]></createtime>

<body><![CDATA[blah blah blah yadda yadda yadda text text text]]></body>

</doc>

What sucks is that the SAS sentiment tools don't appear to build my corpus for me (unless I am missing something?).  Instead, I have to joys of converting all of my text files into xml files with this format. 

I did manually change 5 of my .txt to .xml with the above xml structure.  I was able to upload this successfully.

View solution in original post


All Replies
Solution
‎08-19-2013 06:30 PM
Contributor
Posts: 71

Re: Sentiment Analysis Workbench Corpus format

I just solved my own question just now.  Will it count to mark this as the right answer?

I went into the directory where SAS SA Workbench is installed.  There is a "test_documents" folder with an example corpus.  It looks like the corpus needs to be a zipped folder of XML files.  Each document has the following format:

<doc>

<docid><![CDATA[filename .xml without extension]]></docid>

<title><![CDATA[subject title here]]></title>

<createtime><![CDATA[10/6/2008 10:00:00 AM]]></createtime>

<body><![CDATA[blah blah blah yadda yadda yadda text text text]]></body>

</doc>

What sucks is that the SAS sentiment tools don't appear to build my corpus for me (unless I am missing something?).  Instead, I have to joys of converting all of my text files into xml files with this format. 

I did manually change 5 of my .txt to .xml with the above xml structure.  I was able to upload this successfully.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 1 reply
  • 436 views
  • 0 likes
  • 1 in conversation