Hello,
Need help with detecting anomalies in my dataset
data hdfs_kim.node_monitoring;
input date node1 node2 node3 node4 node5
datalines ;
01-02-2020 0.45 0.44 0.78 0.32 0.99
02-02-2020 0.34 0.32 0.89 0.56 0.77
03-02-2020 0.89 0.65 0.76 043 0.81
04-02-2020 0.73 1.34 0.66 0.33 0.49
05-02-2020 0.23 0.44 0.54 0.66 0.66
06-02-2020 0.88 0.76 2.56 0.61 0.71
;
run;
Can help me find anomalies in this dataset - I have used
proc rpca data=hdfs_kim.node_monitoring
lambdaweight = 3.5
outsparse=hdfs_kim.sparsemat2;
id date;
run;
proc print data=hdfs_kim.sparsemat2;
run;
proc rpca data=hdfs_kim.node_monitoring
scale center;
id date;
anomalydetection;
savestate rstore=hdfs_kim.store;
run;
proc astore;
setoption rpca_projection_type 2;
score rstore=hdfs_kim.store data=hdfs_kim.node_monitoring out=hdfs_kim.scored;
run;
proc print data=hdfs_kim.scored;
run;
I was not able to interpret results using RPCA when i have node1 -16, any help with other process or procedure i am willing try.
Thanks
Rahul
Hi Rahul,
In your output scoring file "scored", you can see the last column which is labeled as "outlier detection score".
Value 1 in that column indicates that the scoring observation is outlier.
You can read more about the anomaly detection functionality of proc RPCA here: SAS Help Center: The RPCA Procedure
Thanks,
-Zohreh
Hi,
I ve read the whole procedure thoroughly but the scored dataset does gives anomalies but when it comes to more than variable ( multi dimension ) this doesnt correctly determines where the anomalies are ?
Any suggestions ?
-Thanks
Rahul
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.