DATA Step, Macro, Functions and more

Comparing fields

Reply
Frequent Contributor
Frequent Contributor
Posts: 83

Comparing fields

Hello all,

 

I'm looking for some insight/direction on the following:

 

I have 5 columns of data, A-E. I want to compare column E with columns A-D and count where they are not equal for all rows. However, columns A-D may have the same record in one or more columns, or might contain a missing record that I don't want to count. 

 

I've started with a string of proc sql statements as follows:

 

proc sql;
create table one as
select count (*)
from combined
where A = E and A not in (' ');

create table two as
select count(*)
from (select * from combined where A <> B and B not in (' '))
where B = E;

create table three as
select count(*)
from (select * from combined where A <> C and C not in (' '))
where C = E;

create table four as
select count(*)
from (select * from combined where A <> D and D not in (' '))
where D = E;

quit;

However, when I QA some of these results I see that I'm counting records that I shouldnt across some of the latter tables. I've played around with adding some additional exclusion criteria but I can't seem to get it right. I think their is an easier way to compare the data in these columns. 

 

Appreciate any thoughts. 

 

Happy New Year!

Super User
Posts: 10,483

Re: Comparing fields

It would help us see what you are attempting if you provide some example input data, which should demonstrate all of the types of cases you might expect, no match, one match, 2 matches , 3 matches, 4 matches And what the outcome for that example data should be.

 

Also if this is to generate a report of some sort what the final report might look like as the counting may well be best done in one of the report procedures.

Frequent Contributor
Frequent Contributor
Posts: 83

Re: Comparing fields

Here is some example data.

 

A B C D E
123   123 123 123
456 111 456   456
685 789 789 789 789
    254 987 987

 

The first sql statement would show a count in table one but not for any of the other tables (because B is missing and C,D = E).

 

The second sql statement would show a count for table one and table two but not three and four (because C=A and D is missing).

 

The third sql statement would show a count in table one and two but not three and four (because C,D = E).

 

The fourth sql statement would show a count in table three and four but not one or two (because A,B are null).

 

I'm trying to do some analysis on a much larger data set. At this point I'm just looking at totals for these conditions.

 

Hope this helps.

 

 

Super User
Posts: 10,483

Re: Comparing fields

That's a fair example of an input set. I note there are no rows without at least one match for E.

 

What is the desired output in a similar tabular form.

 

I suspect that a datastep may be more efficient than your current sql approach as all of the individual comparisons could be done in one pass. But what that desired output might be ...

 

Also generally having different cases in different datasets makes analysis more complicated.

Respected Advisor
Posts: 4,644

Re: Comparing fields

Using the RANGE function which accepts missing values but ignores them might do the trick, both in the data step or SQL version of the solution:

 

data combined;
input a b c d e;
datalines;
123	. 	123	123	123
456	111	456	. 	456
685	789	789	789	789
. 	. 	254	987	987
;

data want;
array ne{4} ne_a ne_b ne_c ne_d;
array col{4} a b c d;
set combined end=done;
do i = 1 to 4;
    if range(col{i},e) ne 0 then ne{i} + 1;
    end;
if done then output;
keep ne_:;
run;

proc print data=want noobs; run;

proc sql;
select 
    sum(range(a,e) ne 0) as ne_a,
    sum(range(b,e) ne 0) as ne_b,
    sum(range(c,e) ne 0) as ne_c,
    sum(range(d,e) ne 0) as ne_d
from combined;
quit;
PG
Super User
Posts: 5,256

Re: Comparing fields

Since you are not satisfied with your current SQL, and don't really understand your desired outcome, it would be great if you could post the corresponding desired output/report.
Data never sleeps
Super User
Posts: 17,784

Re: Comparing fields


Ody wrote:
 

I have 5 columns of data, A-E. I want to compare column E with columns A-D and count where they are not equal for all rows. However, columns A-D may have the same record in one or more columns, or might contain a missing record that I don't want to count. 

 

 


One quick way to compare columns of data to check if all match is to see if the min/max of the rows are the same. 

In general I think that array processing within a data step is the way you'll want to go.  The Min/Max function ignore missing values, but if you want some other criteria you'll have to expand on your question.

 

data want;

set have;

array vals(*) a b c d e;

retain count;

if min(of vals(*)) ne max(of vals(*)) then do;

   flag=1;

   count+1;

end;

else flag=0;

 

if flag=1 then do;

*go through some process to identify the issues;

end;

 

run;

 

Frequent Contributor
Frequent Contributor
Posts: 83

Re: Comparing fields

Thanks for the feedback and suggestions. 

 

I'll have to give this some more thought and maybe come up with a better approach.

Frequent Contributor
Frequent Contributor
Posts: 83

Re: Comparing fields

I was able to complete the comparison I wanted to do by creating separate tables for the individual number fields and stacking them vertically to compare (vs the horizontal approach I was trying to do earlier).

Just fyi, thanks again for the feedback.
Ask a Question
Discussion stats
  • 8 replies
  • 275 views
  • 0 likes
  • 5 in conversation