Hi SAS users,
I have two dataset which have one variable in common (ID). Now, I would like to simply compare this variable between these two datasets; and make sure how many observations of the common variable (ID) from the smaller dataset exist in the bigger dataset.
I used proc compare but I am not sure if its the best method. Please advise.
Thanks
It works. It depends on what you want as output and what meets your requirement.
So what are your requirements besides what's in both? What do you plan to do with this information afterwards.
Personally I find proc compare a bit useless except for quick comparisons and generally write my own code for comparisons.
what I want is just a simple comparison based on how many variable ID is matched in both datasets. No specific requirement.
Thanks
DATA1:
ID A B
8 . .
9 . .
DATA2:
ID C D
1 5 6
2 7 8
3 9 3
4 6 6
proc compare base = data1 compare=data2;
var ID;
with ID;
run;
Why after running the program, in the observation summary it says:
Number of Observations in Common: 2 ?!!
Although, we don't have any common observation in varibale ID. Right?
Thank you
Hi @almmotamedi,
If you want to use PROC COMPARE for this purpose (and not a data step with MERGE statement or PROC SQL), you should use the ID statement instead of the WITH statement.
Thank you, but how using ID statement gives me observation summary? (I tried and it only gives variable summary!). Please help
Not quite right, but enough to get you started.
DATA data1;
input ID A B;
cards;
8 . .
9 . .
;
data DATA2;
input ID C D;
cards;
1 5 6
2 7 8
3 9 3
4 6 6
;
proc sql;
create table want as
select coalesce(a.id,b.id) as ID, a.ID as ID1, b.ID as ID2,
case when a.ID=. then 'Data1'
when b.ID=. then 'Data2'
else 'CHECKME' end as source
from data1 as a
full join data2 as b
on a.id=b.id;
quit;
proc print data=want;
run;
Thank you so much, your answered worked for the sample datasets. But when I applied the code for the real datasets, I received the error below:
ERROR: Expression using equals (=) has components that are of different data types.
for on a.id=b.id;
I had checked the data types and both variables are character with the same length !
Could you please advise?
Show us the proc contents results for the two sets that are involved in the error.
I checked the results and everything is the same except in one dataset the variable has "format" and "informat", but the other does not.
The error message from PROC SQL does not refer to the ON clause, but to the WHEN conditions where character ID's would not match the type of numeric missing values. Just replace the latter with character missings ' ' or use neutral conditions like
when a.ID is null
or
when missing(a.ID)
@almmotamedi wrote:
Thank you, but how using ID statement gives me observation summary? (I tried and it only gives variable summary!). Please help
This would be the case if you omitted the VAR statement.
My suggestion, however, was to "use the ID statement instead of the WITH statement."
proc compare base=data1 compare=data2;
var ID;
id ID;
run;
(But my first choice for this task would not be PROC COMPARE.)
So, could you please advise how you would do it, this task seems to be very easy but I am still struggling to find an answer!
My approach would depend on the characteristics of the two datasets and other circumstances:
[Edit: inserted missing word in item 7]
proc sql; select id from small intersect select id from big; quit;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.