Too many numbers in one variable

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 13
Accepted Solution

Too many numbers in one variable

Hi,

Thanks in advance for your help as I really don't know how to make it work.

A quick overview and an exemple of my problem.
I have to make some automatic treatment of a database (around 800 000 lines and 100 variabel).

User have been putting one number too much in certain variable sometimes, please see this exemple


Variabel A

Value A:1525

Variabel B = 127

Variabel C (then I get for an other place and use as control) is A + B = 249 (the difference of value is due that A should be 125 not 1525).

How do you think I should with a macro that can try to take away one number that is too much (in this example it will take away the 5 of A in order that A+B=C). I have also the problem that it can be a number that have been forget by the user in the middle of a variabel


BIG Thanks in advance


Accepted Solutions
Solution
‎06-28-2016 02:33 AM
Super User
Posts: 7,475

Re: Too many numbers in one variable

This is an idea for taking away a digit:

data have;
input a b c;
cards;
1525 124 249
;
run;

%let maxlen=20;
data want;
set have;
length string1 string2 $&maxlen;
marker = 0;
if a + b ne c
then do;
  string1 = strip(put(a,&maxlen..));
  i = 1;
  do while (not marker and i <= length(string1));
    string2 = substr(string1,1,i)!!substr(string1,i+2);
    if input(string2,&maxlen..) + b = c
    then do;
      marker = 1;
      i = &maxlen + 1;
      a = input(string2,&maxlen..);
    end;
    else i + 1;
  end;
  if marker = 0
  then do;
    string1 = strip(put(b,&maxlen..));
    i = 1;
    do while (not marker and i <= length(string1));
      string2 = substr(string1,1,i)!!substr(string1,i+2);
      if a + input(string2,&maxlen..) = c
      then do;
        marker = 1;
        i = &maxlen + 1;
        b = input(string2,&maxlen..);
      end;
      else i + 1;
    end;
  end;
end;
drop string1 string2 i marker;
run;

Adding a digit is completely out of the question, as you will get a tankerload of false solutions.

Even taking away an arbitrary digit is dangerous if both A and B have been entered incorrectly.

My proposed solution: if you detect a difference to your checksum, throw the data back and have it re-entered.

Best solution: implement a checking routine in the program that handles the input, so it only accepts correct inputs.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers

View solution in original post


All Replies
Super User
Posts: 7,475

Re: Too many numbers in one variable

Can you safely assume that B is always correct, or do you have a rule that lets you recognize which one (of A and B) is?

Without that, you will never know if A and B are wrong, and therefore can't infer one from the other.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Occasional Contributor
Posts: 13

Re: Too many numbers in one variable

No you can¨t assume if B is always correct or which one is.

Of course not possible to know if A and B are wrong but the idea is to reduce the amount of fail by taking away or adding one number in B or A (not both) in order to get A+B =C

 

Any idea?

Occasional Contributor
Posts: 18

Re: Too many numbers in one variable

Sounds like a data cleaning project.

 

Does A and B have to fall within a certain value? If so, create a flag using "if-then" statements. But then the problem occurs that you cannot assume that B is correct either without looking at the original data. For example: A could be 1525 and B could be 125 when in fact they both could be 152.

 

So I would suggest setting up flags and then looking at the original data to determine what the value for A and B should be.

Solution
‎06-28-2016 02:33 AM
Super User
Posts: 7,475

Re: Too many numbers in one variable

This is an idea for taking away a digit:

data have;
input a b c;
cards;
1525 124 249
;
run;

%let maxlen=20;
data want;
set have;
length string1 string2 $&maxlen;
marker = 0;
if a + b ne c
then do;
  string1 = strip(put(a,&maxlen..));
  i = 1;
  do while (not marker and i <= length(string1));
    string2 = substr(string1,1,i)!!substr(string1,i+2);
    if input(string2,&maxlen..) + b = c
    then do;
      marker = 1;
      i = &maxlen + 1;
      a = input(string2,&maxlen..);
    end;
    else i + 1;
  end;
  if marker = 0
  then do;
    string1 = strip(put(b,&maxlen..));
    i = 1;
    do while (not marker and i <= length(string1));
      string2 = substr(string1,1,i)!!substr(string1,i+2);
      if a + input(string2,&maxlen..) = c
      then do;
        marker = 1;
        i = &maxlen + 1;
        b = input(string2,&maxlen..);
      end;
      else i + 1;
    end;
  end;
end;
drop string1 string2 i marker;
run;

Adding a digit is completely out of the question, as you will get a tankerload of false solutions.

Even taking away an arbitrary digit is dangerous if both A and B have been entered incorrectly.

My proposed solution: if you detect a difference to your checksum, throw the data back and have it re-entered.

Best solution: implement a checking routine in the program that handles the input, so it only accepts correct inputs.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 301 views
  • 0 likes
  • 3 in conversation