Desktop productivity for business analysts and programmers

Code to spot anomalies between two variables

Reply
Contributor
Posts: 58

Code to spot anomalies between two variables

Folks,

 

I wonder could anyone provide some ideas for code to help spot anomalies between two variables. I've generated a number of random observations where in some cases there a large differences between values. See the example dataset.

 

As can be seen from the below, these differences could be attributed to human error. An individual keying in an extra 1 or 0 or leaving out a number. Can anyone think of some code which could compare the two numbers and spot issues which could be attributed to keying errors.

 

Would be interested in any ideas.

 

data example;
infile datalines dsd;
   input var1-var2;
   datalines;
12352004,	2352004
12350622,	2350622
10791626,	791626
13112730,	3112730
18028284,	8028284
1999992,		199992
2194664,		21946
3095470,		30954
1076751,		10767
1478045,		14780
962000,		9620
423213,		43213
424649,		4246
500002,		50002
66589,		6589037
17178,		1717800
15000,		150000
12480,		112480
82818,		182818
16304,		116304
19914,		119914
11060,		110060
13568,		110040
323,			323
26738,		26738
32480,		32480
37253,		37253
2500,		2500
3020,		3020
6197,		6197
6986,		6986
1320,		1320
28277,	28277
;run;
Super User
Super User
Posts: 9,840

Re: Code to spot anomalies between two variables

Posted in reply to Sean_OConnor

You could use compged function to see how similar they are (convert to text, then compare):

http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a002206133.htm

 

However it is going to be difficult with a set of guiding rules as

123456789 compared to 234567891 is more or less the same, but a very different number

Super User
Posts: 6,934

Re: Code to spot anomalies between two variables

Posted in reply to Sean_OConnor

I agree with reading them as character.  One possible approach:  ignore position.  Break up each string into a set of 10 counts:  how many 0's, how many 1's, etc.  Then examine the differences in those counts.

Ask a Question
Discussion stats
  • 2 replies
  • 134 views
  • 4 likes
  • 3 in conversation