Hi,
I am trying to see if values from one row of a certain ID group exists in a row of another dataset by the same ID group. the IDs are unique.
I have two tables:
Table 1: Estimates
ID | ans1 | ans2 | ans3 | ans4 |
A1 | 8 | 15 | 40 | 30 |
A2 | 9 | 10 | 11 | 46 |
A3 | 38 | 39 | 40 | 42 |
A4 | 5 | 25 | 35 | 41 |
Table 2: Reference
ID | col1 | col2 | col3 | col4 |
A1 | 5 | 15 | 20 | 40 |
A2 | 11 | 12 | 13 | 37 |
A3 | 38 | 39 | 40 | 42 |
A4 | 5 | 25 | 35 | 41 |
I would like to get the following table (0 marks the point where the values match) eg: the code should check for all variables (where ID=A1) in table1: estimates match against all variables in table 2 (ID=A1): reference. If any value in table 1: estimates match with the same ID from table 2: then the matched variable, should be marked with 0.
Table: want mentioned below: (the 0 means that the values match)
ID | match1 | match2 | match3 | match4 |
A1 | 1 | 0 | 0 | 1 |
A2 | 1 | 1 | 0 | 1 |
A3 | 0 | 0 | 0 | 0 |
A4 | 0 | 0 | 0 | 0 |
I have adoped this code from a similar code written by Kshap earlier, but could not change it to suit my needs.
data reference;
input id $ col1 col2 col3 col4;
datalines;
A1 5 15 20 40
A2 11 12 13 37
A3 38 39 40 42
A4 5 25 35 41
;
data estimates;
input id $ ans1 ans2 ans3 ans4;
datalines;
A1 8 15 40 30
A2 9 10 11 46
A3 38 39 40 42
A4 5 25 35 41
;
data hash(keep=id val);
set reference;
array c{*} col: ;
do i=1 to dim(c);
val=c{i};if not missing(val) then output;
end;
run;
proc sql ;
select distinct id into : list separated by ' ' from reference;
quit;
data want (drop=i j k rc val id);
if _n_ eq 1 then do;
if 0 then set hash;
declare hash ha(dataset:'hash');
ha.definekey('val');
ha.definedata('id');
ha.definedone();
end;
set estimates;
by id;
array g{*} &list.;
array v{*} ans: ;
do k=1 to dim(g);
g{k}=0;
end;
do i=1 to dim(v);
val=v{i};
rc=ha.find();
if rc=0 then do;
do j=1 to dim(g);
if id=vname(g{j}) then g{j}=1;
end;
end;
end;
run;
Please help me.
Many thanks.
regards,
Sebastian
Hello,
%macro match;
data want;
merge reference estimates;
by id;
%do i=1 %to 4;
match&i.=(col&i. ne ans&i.);
%end;
run;
%mend;
%match;
Hello,
%macro match;
data want;
merge reference estimates;
by id;
%do i=1 %to 4;
match&i.=(col&i. ne ans&i.);
%end;
run;
%mend;
%match;
expanding gamotte's code just a little bit to test all 4 ans:
%macro match;
data want;
merge reference estimates;
by id;
%do i=1 %to 4;
test=0; match%i.=1;
%do x=1 %to 4;
test+(col&i. ne ans&x.);
%end;
if test lt 4 then match&i.=0;
%end;
run;
%mend;
%match;
Hello Gamotte,
Thank you very much for your quick response 🙂
The solution that you provided did not provide the exact answer as i wanted.
However, I have adopted your methodology and made a slight iteration myself:
For anyone reading through the forum, code as follows:
data want (drop= i j);
merge reference estimates;
by id;
array ia{*} ans:;
array ja{*} col:;
array match{4} match1 match2 match3 match4;
do i=1 to dim(ia);
do j=1 to dim(ja);
if ja{j}= ia{i} then do;
match{i} = 0;
end;
end;
end;
run;
Once again, thank you.
Why do you need the macro code? Base SAS can handle this just fine:
data want; merge reference estimates; by id; array match{4} col{4} ans{4}; do i=1 to 4; match{i}=(col{i} ne ans{i}); end; run;
I can't check the code as in meeting, but that should be it.
Just to add, if you had your data in a strcuture which is more beneficial to the programming side of things (you can transpose for output) e.g.:
SEQ ANS
1 abc
2 def
...
Same for the other datasets, then merge by SEQ as well, then your code becomes even more simple and easy to maintain:
data want; merge reference estimates; by id seq; match=(col ne ans); run;
I don't have the tine to spell it out right now, but the program would be short:
(1) merge the data sets by ID
(2) set up arrays for the two sets of variables
(3) move through the ANS array, using the WHICHN function against the MATCH array
All told, maybe 10 lines of code.
OK, very similar to your own answer but this will assign both 0's and 1's:
data want;
merge estimates reference;
by id;
array ans {4};
array col {4};
array match {4};
do _n_=1 to 4;
match{_n_} = (whichn(ans{_n_}, of col{*}) = 0);
end;
run;
It is IML thing . data reference; input id $ col1 col2 col3 col4; datalines; A1 5 15 20 40 A2 11 12 13 37 A3 38 39 40 42 A4 5 25 35 41 ; data estimates; input id $ ans1 ans2 ans3 ans4; datalines; A1 8 15 40 30 A2 9 10 11 46 A3 38 39 40 42 A4 5 25 35 41 ; proc iml; use reference; read all var _num_ into x[r=id]; close; use estimates; read all var _num_ into y; close; z=(x^=y); create want from z[r=id c=('match1':'match'+char(ncol(x)))]; append from z[r=id]; close; quit;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.