BookmarkSubscribeRSS Feed
ChrisFromMaryland
Calcite | Level 5

Is there a way to score new data against an existing cluster set and to identify new clusters that are in that new data?

Perhaps an example would be best.  Let's say there is an event coming up about the environment.  I have a dataset of 10,00 online news articles over the last month that talk about the environment.  Some are off-topic, like computer environments or school environments, and others are on-topic, like carbon emissions or sea water rising.  So far, so good.

Now the event is held, a big international event.  Lots of press coverage.  My task ot to see how the on-topic conversations changed in volume and how long that change lasted. 

So let's say that after the event is held I download a new set of online news articles, say 20,000 records this time.  What I want to do are two things. First, I want to score the new data against the rules that were built in the pre-event processing.  Think of it as an apples to apples analysis: using the same rules, the conversations on carbon emissions grew by X percent and lasted Y days; the conversations about sea water rising grew by A percent and lasted B days; the conversations on computer environments and school enviroments did not change.  However, and this to me is the tough part, I want to uncover new topics (custers) that may arise.  So let's say that after the event a discussion about solar energy emerges that was not in the discussions prior to the event.  (I know this sounds weird, but it happens to be true because I've already done the pre-event analysis).  How do I identify these new cluster did not exist in the existing clustering routine? 

1 REPLY 1
FionaMcNeill
SAS Employee

Hi Chris -

This paper on custom entities may be of interest:  http://www.sas.com/en_us/whitepapers/discovering-what-you-want-107347.html

It's a way to include pre-defined entities into a discovery analysis with SAS Text Miner.

You may also be interested in the text profile node in SAS Text Miner, used to associated descriptive terms with different levels of a dependent (target) variable - including time.

Hope this helps,

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 914 views
  • 1 like
  • 2 in conversation