BookmarkSubscribeRSS Feed
Rahul_B
Obsidian | Level 7

Hello,

 

Need help with detecting anomalies in my dataset 

 

data hdfs_kim.node_monitoring;

input  date node1 node2 node3 node4 node5 

datalines ;

01-02-2020 0.45 0.44 0.78 0.32 0.99

02-02-2020 0.34 0.32 0.89 0.56 0.77

03-02-2020 0.89 0.65 0.76 043 0.81

04-02-2020 0.73 1.34 0.66 0.33 0.49

05-02-2020 0.23 0.44  0.54 0.66 0.66

06-02-2020 0.88 0.76 2.56 0.61 0.71

;

run;

 

Can help me find anomalies in this dataset - I have used 

 

proc rpca data=hdfs_kim.node_monitoring
lambdaweight = 3.5
outsparse=hdfs_kim.sparsemat2;
id date;
run;

proc print data=hdfs_kim.sparsemat2;
run;

proc rpca data=hdfs_kim.node_monitoring
scale center;
id date;
anomalydetection;
savestate rstore=hdfs_kim.store;
run;

proc astore;
setoption rpca_projection_type 2;
score rstore=hdfs_kim.store data=hdfs_kim.node_monitoring out=hdfs_kim.scored;
run;

proc print data=hdfs_kim.scored;
run;

 

I was not able to interpret results using RPCA  when i have node1 -16, any help with other process or procedure i am willing try.

 

Thanks 

Rahul 

2 REPLIES 2
Zohreh
SAS Employee

Hi Rahul, 

 

In your output scoring file "scored", you can see the last column which is labeled as "outlier detection score".

Value 1 in that column indicates that  the scoring observation is outlier.

You can read more about the anomaly detection functionality of proc RPCA here: SAS Help Center: The RPCA Procedure 

 

Thanks,

-Zohreh 

Rahul_B
Obsidian | Level 7

Hi,

I ve read the whole procedure thoroughly but the scored dataset does gives anomalies but when it comes to more than variable ( multi dimension ) this  doesnt correctly determines where the anomalies are ? 

 

Any suggestions ?

 

-Thanks 

Rahul 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 1145 views
  • 0 likes
  • 2 in conversation