Compare variables of same dataset

Reply
New Contributor
Posts: 4

Compare variables of same dataset

I have a dataset with 3 variables - V1, V2 and V3.

 

V1    V2     V3

1      1     A
2      A     C
3      2     M
B      D     3

 

I want to compare the variables and find if the values in varibale V1 is present in either of the variable V2 or V3 and generate the output accordingly.  

 

So in the above example, 1, 2 and 3 values of variable V1 are present in either V2 or in V3 variables. Ouput should say "B" is not present in V2 and V3.

 

Any help would be greatly appreciated.

Super User
Posts: 19,770

Re: Compare variables of same dataset

Transposing your data and using PROC FREQ is the best choice IMO, output would be like below in listing and you get a dataset you can filter. 

 

The FREQ Procedure

 
Frequency
Table of variable_name by variable_value
variable_name variable_value
1 2 3 A B C D M Total
V1
1
1
1
0
1
0
0
0
4
V2
1
1
0
1
0
0
1
0
4
V3
0
0
1
1
0
1
0
1
4
Total
2
2
2
2
1
1
1
1
12

 

data want;
    input V1 $    V2 $    V3 $;
    cards;
1      1     A
2      A     C
3      2     M
B      D     3
;
run;

data long_format;
    set want;
    array v(*) v1-v3;

    do i=1 to dim(v);
        variable_name=vname(v(i));
        variable_value=v(i);
        output;
    end;
    keep variable:;
run;

proc freq data=long_format;
    table variable_name*variable_value / nopercent norow nocol out=freqs_out;
run;
PROC Star
Posts: 283

Re: Compare variables of same dataset

[ Edited ]

Hello, I think put comments to log should meet your requirement:

/*your sample*/

data have;
input (v1-v3) ($);
datalines;
1 1 A
2 A C
3 2 M
B D 3
;

 

/*Solution*/

data _null_;
if (_n_ = 1) then do;
if 0 then set have;
declare hash myhash(dataset: "have(keep=v2");
rc = myhash.definekey('v2');
myhash.definedone();
declare hash myhash2(dataset: "have(keep=v3");
rc = myhash2.definekey('v3');
myhash2.definedone();
end;
set have(keep=v1);
if myhash.check(key:v1) ne 0 then put v1= 'is not in Variable 2';
if myhash2.check(key:v1) ne 0 then put v1= 'is not in Variable 3';
if myhash.check(key:v1) = 0 then put v1= 'is in Variable 2' ;
if myhash2.check(key:v1) = 0 then put v1= 'is in Variable 3' ;
if myhash.check(key:v1) ne 0 and myhash2.check(key:v1) ne 0 then put v1= 'is not in Variable 2 and variable 3';
run;

 

Regards.

Naveen Srinivasan

PROC Star
Posts: 7,468

Re: Compare variables of same dataset

data have;
input (v1-v3) ($);
datalines;
1 1 A
2 A C
3 2 M
B D 3
;

data want;
  set have;
  length comment $40.;
  array fields(2) $ v2 v3;
  if v1 not in fields then 
   comment= catt('"',v1,'" is not present in V2 and V3');
run;

Art, CEO, AnalystFinder.com

 

Super User
Posts: 5,498

Re: Compare variables of same dataset

This should produce the report of all V1 values that do not appear in V2 or V3:

 

proc sql;

select V1 from have where

V1 not in (select distinct V2 from have) and

V1 not in (select distinct V3 from have);

quit;

Occasional Contributor
Posts: 9

Re: Compare variables of same dataset

Posted in reply to Astounding

The same thing I was thinking. Just adding additional "distinct" and "order by" to clean up the output

 

proc sql;

  select distinct V1

  from have

  where V1 not in (select distinct V2 from have) and V1 not in (select distinct V3 from have)

  order by V1

;

quit;

Respected Advisor
Posts: 4,173

Re: Compare variables of same dataset

Another version using a hash lookup.

data have;
  input (v1-v3) ($);
datalines;
1 1 A
2 A C
3 2 M
B D 3
;
run;

proc sql;
  create view vHave as
    select v2 as v1 from have
    outer union corr
    select v3 as v1 from have
  ;
quit;

data want;
  set have end=last;
  if _n_=1 then
    do;
      dcl hash h1(dataset:'vHave', multidata:'n');
      h1.defineKey('v1');
      h1.defineDone();
    end;
  if h1.check() ne 0 then match='no ';
  else match='yes';
run;
Ask a Question
Discussion stats
  • 6 replies
  • 522 views
  • 0 likes
  • 7 in conversation