Hello,
Need help with detecting anomalies in my dataset
data hdfs_kim.node_monitoring;
input date node1 node2 node3 node4 node5
datalines ;
01-02-2020 0.45 0.44 0.78 0.32 0.99
02-02-2020 0.34 0.32 0.89 0.56 0.77
03-02-2020 0.89 0.65 0.76 043 0.81
04-02-2020 0.73 1.34 0.66 0.33 0.49
05-02-2020 0.23 0.44 0.54 0.66 0.66
06-02-2020 0.88 0.76 2.56 0.61 0.71
;
run;
Can help me find anomalies in this dataset - I have used
proc rpca data=hdfs_kim.node_monitoring
lambdaweight = 3.5
outsparse=hdfs_kim.sparsemat2;
id date;
run;
proc print data=hdfs_kim.sparsemat2;
run;
proc rpca data=hdfs_kim.node_monitoring
scale center;
id date;
anomalydetection;
savestate rstore=hdfs_kim.store;
run;
proc astore;
setoption rpca_projection_type 2;
score rstore=hdfs_kim.store data=hdfs_kim.node_monitoring out=hdfs_kim.scored;
run;
proc print data=hdfs_kim.scored;
run;
I was not able to interpret results using RPCA when i have node1 -16, any help with other process or procedure i am willing try.
Thanks
Rahul
Hi Rahul,
In your output scoring file "scored", you can see the last column which is labeled as "outlier detection score".
Value 1 in that column indicates that the scoring observation is outlier.
You can read more about the anomaly detection functionality of proc RPCA here: SAS Help Center: The RPCA Procedure
Thanks,
-Zohreh
Hi,
I ve read the whole procedure thoroughly but the scored dataset does gives anomalies but when it comes to more than variable ( multi dimension ) this doesnt correctly determines where the anomalies are ?
Any suggestions ?
-Thanks
Rahul
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.