BookmarkSubscribeRSS Feed
Ody
Quartz | Level 8 Ody
Quartz | Level 8

Hello all,

 

I'm looking for some insight/direction on the following:

 

I have 5 columns of data, A-E. I want to compare column E with columns A-D and count where they are not equal for all rows. However, columns A-D may have the same record in one or more columns, or might contain a missing record that I don't want to count. 

 

I've started with a string of proc sql statements as follows:

 

proc sql;
create table one as
select count (*)
from combined
where A = E and A not in (' ');

create table two as
select count(*)
from (select * from combined where A <> B and B not in (' '))
where B = E;

create table three as
select count(*)
from (select * from combined where A <> C and C not in (' '))
where C = E;

create table four as
select count(*)
from (select * from combined where A <> D and D not in (' '))
where D = E;

quit;

However, when I QA some of these results I see that I'm counting records that I shouldnt across some of the latter tables. I've played around with adding some additional exclusion criteria but I can't seem to get it right. I think their is an easier way to compare the data in these columns. 

 

Appreciate any thoughts. 

 

Happy New Year!

8 REPLIES 8
ballardw
Super User

It would help us see what you are attempting if you provide some example input data, which should demonstrate all of the types of cases you might expect, no match, one match, 2 matches , 3 matches, 4 matches And what the outcome for that example data should be.

 

Also if this is to generate a report of some sort what the final report might look like as the counting may well be best done in one of the report procedures.

Ody
Quartz | Level 8 Ody
Quartz | Level 8

Here is some example data.

 

A B C D E
123   123 123 123
456 111 456   456
685 789 789 789 789
    254 987 987

 

The first sql statement would show a count in table one but not for any of the other tables (because B is missing and C,D = E).

 

The second sql statement would show a count for table one and table two but not three and four (because C=A and D is missing).

 

The third sql statement would show a count in table one and two but not three and four (because C,D = E).

 

The fourth sql statement would show a count in table three and four but not one or two (because A,B are null).

 

I'm trying to do some analysis on a much larger data set. At this point I'm just looking at totals for these conditions.

 

Hope this helps.

 

 

ballardw
Super User

That's a fair example of an input set. I note there are no rows without at least one match for E.

 

What is the desired output in a similar tabular form.

 

I suspect that a datastep may be more efficient than your current sql approach as all of the individual comparisons could be done in one pass. But what that desired output might be ...

 

Also generally having different cases in different datasets makes analysis more complicated.

PGStats
Opal | Level 21

Using the RANGE function which accepts missing values but ignores them might do the trick, both in the data step or SQL version of the solution:

 

data combined;
input a b c d e;
datalines;
123	. 	123	123	123
456	111	456	. 	456
685	789	789	789	789
. 	. 	254	987	987
;

data want;
array ne{4} ne_a ne_b ne_c ne_d;
array col{4} a b c d;
set combined end=done;
do i = 1 to 4;
    if range(col{i},e) ne 0 then ne{i} + 1;
    end;
if done then output;
keep ne_:;
run;

proc print data=want noobs; run;

proc sql;
select 
    sum(range(a,e) ne 0) as ne_a,
    sum(range(b,e) ne 0) as ne_b,
    sum(range(c,e) ne 0) as ne_c,
    sum(range(d,e) ne 0) as ne_d
from combined;
quit;
PG
LinusH
Tourmaline | Level 20
Since you are not satisfied with your current SQL, and don't really understand your desired outcome, it would be great if you could post the corresponding desired output/report.
Data never sleeps
Reeza
Super User

@Ody wrote:
 

I have 5 columns of data, A-E. I want to compare column E with columns A-D and count where they are not equal for all rows. However, columns A-D may have the same record in one or more columns, or might contain a missing record that I don't want to count. 

 

 


One quick way to compare columns of data to check if all match is to see if the min/max of the rows are the same. 

In general I think that array processing within a data step is the way you'll want to go.  The Min/Max function ignore missing values, but if you want some other criteria you'll have to expand on your question.

 

data want;

set have;

array vals(*) a b c d e;

retain count;

if min(of vals(*)) ne max(of vals(*)) then do;

   flag=1;

   count+1;

end;

else flag=0;

 

if flag=1 then do;

*go through some process to identify the issues;

end;

 

run;

 

Ody
Quartz | Level 8 Ody
Quartz | Level 8

Thanks for the feedback and suggestions. 

 

I'll have to give this some more thought and maybe come up with a better approach.

Ody
Quartz | Level 8 Ody
Quartz | Level 8
I was able to complete the comparison I wanted to do by creating separate tables for the individual number fields and stacking them vertically to compare (vs the horizontal approach I was trying to do earlier).

Just fyi, thanks again for the feedback.

sas-innovate-2024.png

Today is the last day to save with the early bird rate! Register today for just $695 - $100 off the standard rate.

 

Plus, pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1005 views
  • 0 likes
  • 5 in conversation