DATA Step, Macro, Functions and more

Checking if value of one row in one dataset exists in one row of other dataset by ID

Accepted Solution Solved
Reply
Contributor
Posts: 44
Accepted Solution

Checking if value of one row in one dataset exists in one row of other dataset by ID

Hi,

I am trying to see if values from one row of a certain ID group exists in a row of another dataset by the same ID group. the IDs are unique.

I have two tables:

Table 1: Estimates

ID ans1 ans2 ans3 ans4
A1 8 15 40 30
A2 9 10 11 46
A3 38 39 40 42
A4 5 25 35 41

Table 2: Reference

 

ID col1 col2 col3 col4
A1 5 15 20 40
A2 11 12 13 37
A3 38 39 40 42
A4 5 25 35 41

 

 

I would like to get the following table (0 marks the point where the values match) eg: the code should check for all variables (where ID=A1) in table1: estimates match against all variables in table 2 (ID=A1): reference. If any value in table 1: estimates match with the same ID from table 2: then the matched variable, should be marked with 0.

Table: want mentioned below: (the 0 means that the values match)

ID match1 match2 match3 match4
A1 1 0 0 1
A2 1 1 0 1
A3 0 0 0 0
A4 0 0 0 0

 

 

I have adoped this code from a similar code written by Kshap earlier, but could not change it to suit my needs.


data reference;
input id $ col1 col2 col3 col4;
datalines;
A1 5 15 20 40
A2 11 12 13 37
A3 38 39 40 42
A4 5 25 35 41 
;

data estimates;
input id $ ans1 ans2 ans3 ans4;
datalines;
A1 8 15 40 30
A2 9 10 11 46
A3 38 39 40 42
A4 5 25 35 41 
;


data hash(keep=id val);
 set reference;
 array c{*} col: ;
 do i=1 to dim(c);
  val=c{i};if not missing(val) then output;
 end;
run;
proc sql ;
select distinct id  into : list separated by ' ' from reference;
quit;
data want (drop=i j k rc val id);
 if _n_ eq 1 then do;
  if 0 then set hash;
  declare hash ha(dataset:'hash');
   ha.definekey('val');
   ha.definedata('id');
   ha.definedone();
end;
set estimates;
by id;
array g{*} &list.;
array v{*} ans: ;
do k=1 to dim(g);
 g{k}=0;
end;
do i=1 to dim(v);
 val=v{i};
rc=ha.find();
 if rc=0 then do;
  do j=1 to dim(g);
	if id=vname(g{j}) then g{j}=1;
  end;
 end;
end;
run;





Please help me.

 

Many thanks.

regards,

Sebastian


Accepted Solutions
Solution
‎07-21-2016 07:58 AM
Regular Contributor
Posts: 234

Re: Checking if value of one row in one dataset exists in one row of other dataset by ID

Posted in reply to sebster24

Hello,

 

 

%macro match;

    data want;
        merge reference estimates;
        by id;
        %do i=1 %to 4;
            match&i.=(col&i. ne ans&i.);
        %end;

    run;

%mend;

%match;

View solution in original post


All Replies
Solution
‎07-21-2016 07:58 AM
Regular Contributor
Posts: 234

Re: Checking if value of one row in one dataset exists in one row of other dataset by ID

Posted in reply to sebster24

Hello,

 

 

%macro match;

    data want;
        merge reference estimates;
        by id;
        %do i=1 %to 4;
            match&i.=(col&i. ne ans&i.);
        %end;

    run;

%mend;

%match;
Frequent Contributor
Posts: 95

Re: Checking if value of one row in one dataset exists in one row of other dataset by ID

expanding  gamotte's code just a little bit to test all 4 ans:

 

 %macro match;  

   data want;  

     merge reference estimates;  

     by id;    

    %do i=1 %to 4;  

        test=0; match%i.=1;   

         %do x=1 %to 4;      

         test+(col&i. ne ans&x.);   

        %end;                

      if test lt 4 then match&i.=0;    

    %end; 

  run;

 %mend; 

%match;

Contributor
Posts: 44

Re: Checking if value of one row in one dataset exists in one row of other dataset by ID

Hello Gamotte,

Thank you very much for your quick response Smiley Happy

The solution that you provided did not provide the exact answer as i wanted.

However, I have adopted your methodology and made a slight iteration myself:

For anyone reading through the forum, code as follows:

 

data want (drop= i j);
	merge reference estimates;
	by id;
	array ia{*}  ans:;
	array ja{*}  col:;
	array match{4} match1 match2 match3 match4;
	do i=1 to dim(ia);
		do j=1 to dim(ja);
			if  ja{j}= ia{i} then do;
				match{i} = 0;
			end;
		end;
	end;
run;

Once again, thank you.

Super User
Super User
Posts: 7,977

Re: Checking if value of one row in one dataset exists in one row of other dataset by ID

[ Edited ]

Why do you need the macro code?  Base SAS can handle this just fine:

data want;
  merge reference estimates;
  by id;
  array match{4} col{4} ans{4};
  do i=1 to 4;
    match{i}=(col{i} ne ans{i});
  end;
run;

I can't check the code as in meeting, but that should be it.

 

Just to add, if you had your data in a strcuture which is more beneficial to the programming side of things (you can transpose for output) e.g.:

SEQ     ANS

1          abc

2          def

...

Same for the other datasets, then merge by SEQ as well, then your code becomes even more simple and easy to maintain:

data want;
  merge reference estimates;
  by id seq;
  match=(col ne ans);
run;
Super User
Posts: 5,516

Re: Checking if value of one row in one dataset exists in one row of other dataset by ID

Posted in reply to sebster24

I don't have the tine to spell it out right now, but the program would be short:

 

(1) merge the data sets by ID

(2) set up arrays for the two sets of variables

(3) move through the ANS array, using the WHICHN function against the MATCH array

 

All told, maybe 10 lines of code.

Super User
Posts: 5,516

Re: Checking if value of one row in one dataset exists in one row of other dataset by ID

Posted in reply to Astounding

OK, very similar to your own answer but this will assign both 0's and 1's:

 

data want;

merge estimates reference;

by id;

array ans {4};

array col {4};

array match {4};

do _n_=1 to 4;

   match{_n_} = (whichn(ans{_n_}, of col{*}) = 0); 

end;

run;

Super User
Posts: 10,041

Re: Checking if value of one row in one dataset exists in one row of other dataset by ID

Posted in reply to sebster24
It is IML thing .


data reference;
input id $ col1 col2 col3 col4;
datalines;
A1 5 15 20 40
A2 11 12 13 37
A3 38 39 40 42
A4 5 25 35 41 
;

data estimates;
input id $ ans1 ans2 ans3 ans4;
datalines;
A1 8 15 40 30
A2 9 10 11 46
A3 38 39 40 42
A4 5 25 35 41 
;
proc iml;
use reference;
read all var _num_ into x[r=id];
close;
use estimates;
read all var _num_ into y;
close;
z=(x^=y);
create want from z[r=id c=('match1':'match'+char(ncol(x)))];
append from z[r=id];
close;
quit;
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 1222 views
  • 3 likes
  • 6 in conversation