BookmarkSubscribeRSS Feed
billyfok30
Calcite | Level 5

Hi everyone 

 

I'm totally new to Text mining and am doing my coursework. 

I need some guidance on what is Max SVD dimension and SVD Resolution. 

I understand that SVD Resolution is the setting of low, high and medium. (how do they relate to each other)

 

after setting the SVD Resolution, need to set the MAX SVD dimension. (May I know how do I determine the value when I set the SVD resolution e.g low? and how they relate  to each other (Max SVD dimension and SVD Resolution )

 

once I determine the setting, click run> results. Need to focus 3 areas, Clusters Freq, Cluster Freq by RMS and Distance Between Clusters. I need some guidance on how to interpret the results.

 

is there any node to compare 2 different Text Cluster (low, 100 vs high,100) algorithm? because of my coursework requirement, is needed to compare both algorithms. 

 

1 REPLY 1
DougWielenga
SAS Employee

I need some guidance on what is Max SVD dimension and SVD Resolution. 

 

The short answer...

 

The number of automatic topics you request is the number of SVD dimensions used. Each SVD dimension corresponds to the topic. You can select low, medium, or high resolution for the number of dimensions. The resolution determines the number of computed dimensions set by the maximum SVD dimension property that should be used by the clustering algorithm.  Low, medium, and high resolutions correspond to 2/3, 5/6, and 6/6 (100%) of the computed dimensions, respectively.

 

Some more details...

 

The SVD option creates orthogonal columns that characterize the terms data set in fewer dimensions than the document by term matrix. A high number of SVD dimensions usually summarizes the data better but requires a lot of computing resources. In addition, the higher the number, the higher the risk of fitting to noise. The default transform method is the SVD. The default number of dimensions that the SVD creates is 100.

 

When you begin your analysis you will probably want to use the low resolution so that you can reduce the computing resources required by the clustering algorithm, but you can still evaluate additional SVD dimensions in order to determine whether further dimensions are needed for clustering. After you have determined the adequate number of SVD dimensions for clustering, there is no need to compute more SVDs than you are going to use.

You can evaluate further SVD dimensions by using a scree plot. The scree plot is found in the Interactive Results and shows the proportional amount of variance explained by each additional SVD dimension.  When this plot starts to flatten out, this is a sign that the information contained in the later dimensions does not add much to the model.  

In projects we have done, we have not seen the use for going over about 50 SVDs.  We have run multiple Text Cluster nodes with different amounts of SVDs.  However, we make sure to keep the SVD resolution property set to High.  This makes sure that all of the SVDs created are used.  Consider trying different numbers of SVDs ranging from 10-50, perhaps times incrementing upward by 5 at a time so you can see what is generated at each and draw some conclusions about how many to specify.  

 

Hope this helps!

Doug

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 1669 views
  • 0 likes
  • 2 in conversation