## Too many numbers in one variable

Solved
Occasional Contributor
Posts: 13

# Too many numbers in one variable

Hi,

Thanks in advance for your help as I really don't know how to make it work.

A quick overview and an exemple of my problem.
I have to make some automatic treatment of a database (around 800 000 lines and 100 variabel).

User have been putting one number too much in certain variable sometimes, please see this exemple

Variabel A

Value A:1525

Variabel B = 127

Variabel C (then I get for an other place and use as control) is A + B = 249 (the difference of value is due that A should be 125 not 1525).

How do you think I should with a macro that can try to take away one number that is too much (in this example it will take away the 5 of A in order that A+B=C). I have also the problem that it can be a number that have been forget by the user in the middle of a variabel

Accepted Solutions
Solution
‎06-28-2016 02:33 AM
Super User
Posts: 10,227

## Re: Too many numbers in one variable

This is an idea for taking away a digit:

``````data have;
input a b c;
cards;
1525 124 249
;
run;

%let maxlen=20;
data want;
set have;
length string1 string2 \$&maxlen;
marker = 0;
if a + b ne c
then do;
string1 = strip(put(a,&maxlen..));
i = 1;
do while (not marker and i <= length(string1));
string2 = substr(string1,1,i)!!substr(string1,i+2);
if input(string2,&maxlen..) + b = c
then do;
marker = 1;
i = &maxlen + 1;
a = input(string2,&maxlen..);
end;
else i + 1;
end;
if marker = 0
then do;
string1 = strip(put(b,&maxlen..));
i = 1;
do while (not marker and i <= length(string1));
string2 = substr(string1,1,i)!!substr(string1,i+2);
if a + input(string2,&maxlen..) = c
then do;
marker = 1;
i = &maxlen + 1;
b = input(string2,&maxlen..);
end;
else i + 1;
end;
end;
end;
drop string1 string2 i marker;
run;
``````

Adding a digit is completely out of the question, as you will get a tankerload of false solutions.

Even taking away an arbitrary digit is dangerous if both A and B have been entered incorrectly.

My proposed solution: if you detect a difference to your checksum, throw the data back and have it re-entered.

Best solution: implement a checking routine in the program that handles the input, so it only accepts correct inputs.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code

All Replies
Super User
Posts: 10,227

## Re: Too many numbers in one variable

Can you safely assume that B is always correct, or do you have a rule that lets you recognize which one (of A and B) is?

Without that, you will never know if A and B are wrong, and therefore can't infer one from the other.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Occasional Contributor
Posts: 13

## Re: Too many numbers in one variable

No you can¨t assume if B is always correct or which one is.

Of course not possible to know if A and B are wrong but the idea is to reduce the amount of fail by taking away or adding one number in B or A (not both) in order to get A+B =C

Any idea?

Occasional Contributor
Posts: 18

## Re: Too many numbers in one variable

Sounds like a data cleaning project.

Does A and B have to fall within a certain value? If so, create a flag using "if-then" statements. But then the problem occurs that you cannot assume that B is correct either without looking at the original data. For example: A could be 1525 and B could be 125 when in fact they both could be 152.

So I would suggest setting up flags and then looking at the original data to determine what the value for A and B should be.

Solution
‎06-28-2016 02:33 AM
Super User
Posts: 10,227

## Re: Too many numbers in one variable

This is an idea for taking away a digit:

``````data have;
input a b c;
cards;
1525 124 249
;
run;

%let maxlen=20;
data want;
set have;
length string1 string2 \$&maxlen;
marker = 0;
if a + b ne c
then do;
string1 = strip(put(a,&maxlen..));
i = 1;
do while (not marker and i <= length(string1));
string2 = substr(string1,1,i)!!substr(string1,i+2);
if input(string2,&maxlen..) + b = c
then do;
marker = 1;
i = &maxlen + 1;
a = input(string2,&maxlen..);
end;
else i + 1;
end;
if marker = 0
then do;
string1 = strip(put(b,&maxlen..));
i = 1;
do while (not marker and i <= length(string1));
string2 = substr(string1,1,i)!!substr(string1,i+2);
if a + input(string2,&maxlen..) = c
then do;
marker = 1;
i = &maxlen + 1;
b = input(string2,&maxlen..);
end;
else i + 1;
end;
end;
end;
drop string1 string2 i marker;
run;
``````

Adding a digit is completely out of the question, as you will get a tankerload of false solutions.

Even taking away an arbitrary digit is dangerous if both A and B have been entered incorrectly.

My proposed solution: if you detect a difference to your checksum, throw the data back and have it re-entered.

Best solution: implement a checking routine in the program that handles the input, so it only accepts correct inputs.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
🔒 This topic is solved and locked.