BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
nelson_lee
Fluorite | Level 6

hi all,

 

I have the following dataset and I want to find out the most important Peers / neibourhood for each "Person" on "From". And I want to calculate the different type of Centrality for those Peers. May I know how?

 

From               To             Transactions Count              Transaction Amount

A                      B                10                                       10000

A                      C                5                                         2000

B                      A                2                                         100000

.....


I have tried using "Link Analysis" with the following setting for my datasets, but it pop up error

Dataset's Role set to be "Transaction"
"From" set to be "Referrer"
"To" to be "Target"
"Transaction Count" to be "Sequence"

Please help

 

1 ACCEPTED SOLUTION

Accepted Solutions
DougWielenga
SAS Employee

There are two things that might be problematic with your transactional approach.  

1 - there are several rows typically associated with each transaction id -- you have a separate transaction for each row which is fine if you want to treat each row as being completely separate

 

2 - I experimented with some mock data and figured out that I needed to use a different variable name than 'To'.   Using the same data after changing the variable name 'To' to the name 'Towards' allowed it to run.

 

Also, you are including the value Txn_Amount in your data but it is not going to be used in Link Analysis.  There is no frequency or weight variable that is used since I get the same results either way.   Here are the first few rows of my test data:

 

From  Towards ID Seq

A U 1 1
U K 1 2
K U 1 3
U M 1 4
A U 5 1
U K 5 2
K U 5 3
U O 5 4
O O 5 5
A U 10 1
U U 10 2
O O 10 3
A Y 13 1
Y K 13 2
K Y 13 3
Y Y 13 4

 

You will notice that there are several rows per id and the sequence variable restarts at 1 for each new id.  Of course, any set of ordinally equivalent sequence values should be similar.   

Hope this helps!

Doug

View solution in original post

4 REPLIES 4
DougWielenga
SAS Employee

The Link Analysis node in SAS Enterprise Miner is capable of calculating centrality measures for transactional data (multiple rows per 'transaction' as identified by a transaction id field) or observational data (each row corresponds to an entire 'transaction' or sequence), but you appear to have a summarized data which does not fit either input format.   Do you have the data set that was used to create your summary data?  If so, it should be fairly easy to specify the appropriate roles for the data.  

 

It seems like you are viewing each distinct value in the 'From' or 'To' field as a 'person' and that you are summarizing the total number of transactions and the total sum of those transactions for each combination of individuals.  If you have a data set containing all of the transactions and their amount rather than the summarized values, you would specify 

 

From ---> Input 

To ---> Target 

 

but would also need to set the value of 

 

Session_ID --> ID

Session_Sequence --> Sequence

 

but there is no role to specify for the amount of the transaction.   The number of transactions would be captured when the data is summarized by SAS Enterprise Miner since it would count the transactions as it processes the data.   You can find more about the Link Analysis node by opening SAS Enterprise Miner and clicking on 

 

    Help --> Contents

 

and then navigating in the panel on the left to 

 

Node Reference

     Explore 

            Link Analysis Node

 

After clicking on Link Analysis Node in the panel on the left, you can select from several relevant links in the panel on the right including

 

   Input Data Requirements for the Link Analysis Node

 

   Link Analysis Node Properties

          Link Analysis Node Train Properties:  Centrality Measures Properties

 

   Link Analysis Node Examples

 

I hope this helps!

Doug 

nelson_lee
Fluorite | Level 6

Thanks Doug,

It's really help, while I receive below Error after changing my Dataset and Roles.

 

Now my dataset becomes

From                To                    Txn_Amt           Txn_ID          Seq_ID

A                      B                      1000                  1                   1

A                      B                      10000                2                   2

A                      C                      10000                3                   3

B                      C                      10000                4                   1

B                      C                      10000                5                   2

B                      D                      10000                6                   3

 

And I set the Dataset role as "Transaction"
"From" --> Input

"To" --> Target

"Txn_Amt" --> Input

"Txn_ID" --> ID

"Seq_ID" --> Sequence

 

I then link the Dataset to "Link Analysis Node" and Run, while it pops up below error 

Error: Minimum support level is either too high to detect any rules or too low that runs into out of memory issue.

 

Given that my EM_TRAIN_MAXLEVELS = 100,000

 

Thanks,

Nelson

DougWielenga
SAS Employee

There are two things that might be problematic with your transactional approach.  

1 - there are several rows typically associated with each transaction id -- you have a separate transaction for each row which is fine if you want to treat each row as being completely separate

 

2 - I experimented with some mock data and figured out that I needed to use a different variable name than 'To'.   Using the same data after changing the variable name 'To' to the name 'Towards' allowed it to run.

 

Also, you are including the value Txn_Amount in your data but it is not going to be used in Link Analysis.  There is no frequency or weight variable that is used since I get the same results either way.   Here are the first few rows of my test data:

 

From  Towards ID Seq

A U 1 1
U K 1 2
K U 1 3
U M 1 4
A U 5 1
U K 5 2
K U 5 3
U O 5 4
O O 5 5
A U 10 1
U U 10 2
O O 10 3
A Y 13 1
Y K 13 2
K Y 13 3
Y Y 13 4

 

You will notice that there are several rows per id and the sequence variable restarts at 1 for each new id.  Of course, any set of ordinally equivalent sequence values should be similar.   

Hope this helps!

Doug

nelson_lee
Fluorite | Level 6

Great thanks Doug,

Following your instructions, I can get the result I want, many thanks

my dataset now becomes the following and there is no error


From             Toward              Reference_No               Seq_ID

A                    B                       xxxxxx1                           1

A                    B                       xxxxxx2                           2

A                    C                       xxxxxx3                           3

B                    C                       xxxxxx4                           1

B                    C                       xxxxxx5                           2

B                    D                       xxxxxx6                           3

........

 

Thank again,

Nelson

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1291 views
  • 2 likes
  • 2 in conversation