04-25-2016 02:08 PM
Hi SAS users,
I have two dataset which have one variable in common (ID). Now, I would like to simply compare this variable between these two datasets; and make sure how many observations of the common variable (ID) from the smaller dataset exist in the bigger dataset.
I used proc compare but I am not sure if its the best method. Please advise.
04-25-2016 02:11 PM
It works. It depends on what you want as output and what meets your requirement.
So what are your requirements besides what's in both? What do you plan to do with this information afterwards.
Personally I find proc compare a bit useless except for quick comparisons and generally write my own code for comparisons.
04-25-2016 02:18 PM - edited 04-25-2016 02:28 PM
what I want is just a simple comparison based on how many variable ID is matched in both datasets. No specific requirement.
04-25-2016 02:25 PM - edited 04-25-2016 02:26 PM
ID A B
8 . .
9 . .
ID C D
1 5 6
2 7 8
3 9 3
4 6 6
proc compare base = data1 compare=data2;
Why after running the program, in the observation summary it says:
Number of Observations in Common: 2 ?!!
Although, we don't have any common observation in varibale ID. Right?
04-25-2016 02:46 PM
If you want to use PROC COMPARE for this purpose (and not a data step with MERGE statement or PROC SQL), you should use the ID statement instead of the WITH statement.
04-25-2016 03:21 PM
Not quite right, but enough to get you started.
DATA data1; input ID A B; cards; 8 . . 9 . . ; data DATA2; input ID C D; cards; 1 5 6 2 7 8 3 9 3 4 6 6 ; proc sql; create table want as select coalesce(a.id,b.id) as ID, a.ID as ID1, b.ID as ID2, case when a.ID=. then 'Data1' when b.ID=. then 'Data2' else 'CHECKME' end as source from data1 as a full join data2 as b on a.id=b.id; quit; proc print data=want; run;
04-25-2016 04:13 PM - edited 04-25-2016 04:13 PM
Thank you so much, your answered worked for the sample datasets. But when I applied the code for the real datasets, I received the error below:
ERROR: Expression using equals (=) has components that are of different data types.
for on a.id=b.id;
I had checked the data types and both variables are character with the same length !
Could you please advise?
04-25-2016 04:51 PM
The error message from PROC SQL does not refer to the ON clause, but to the WHEN conditions where character ID's would not match the type of numeric missing values. Just replace the latter with character missings ' ' or use neutral conditions like
when a.ID is null
04-25-2016 04:38 PM
Thank you, but how using ID statement gives me observation summary? (I tried and it only gives variable summary!). Please help
This would be the case if you omitted the VAR statement.
My suggestion, however, was to "use the ID statement instead of the WITH statement."
proc compare base=data1 compare=data2; var ID; id ID; run;
(But my first choice for this task would not be PROC COMPARE.)
04-25-2016 05:20 PM - edited 04-26-2016 05:18 AM
My approach would depend on the characteristics of the two datasets and other circumstances:
[Edit: inserted missing word in item 7]