BookmarkSubscribeRSS Feed
almmotamedi
Obsidian | Level 7

Hi SAS users,

 

I have two dataset which have one variable in common (ID). Now, I would like to simply compare this variable between these two datasets; and make sure how many observations of the common variable (ID) from the smaller dataset exist in the bigger dataset.

 

I used proc compare but I am not sure if its the best method. Please advise.

 

Thanks

14 REPLIES 14
Reeza
Super User

It works. It depends on what you want as output and what meets your requirement. 

 

So what are your requirements besides what's in both? What do you plan to do with this information afterwards.

 

Personally I find proc compare a bit useless except for quick comparisons and generally write my own code for comparisons.

almmotamedi
Obsidian | Level 7

what I want is just a simple comparison based on how many variable ID is matched in both datasets. No specific requirement.

 

Thanks

almmotamedi
Obsidian | Level 7

DATA1:

 

ID  A  B

8    .   .

9    .   .

 

 

DATA2:

 

ID C  D

1   5  6

2   7  8

3   9  3

4   6  6

 

proc compare base = data1 compare=data2;

var ID;

with ID;

run;

 

 

Why after running the program, in the observation summary it says:

Number of Observations in Common: 2  ?!!

 

Although, we don't have any common observation in varibale ID. Right?

 

 

Thank you

 

 

 

FreelanceReinh
Jade | Level 19

Hi @almmotamedi,

 

If you want to use PROC COMPARE for this purpose (and not a data step with MERGE statement or PROC SQL), you should use the ID statement instead of the WITH statement.

almmotamedi
Obsidian | Level 7

Thank you, but how using ID statement gives me observation summary? (I tried and it only gives variable summary!). Please help

Reeza
Super User

Not quite right, but enough to get you started.

 

DATA data1;
 
input ID  A  B;
cards;
8    .   .
9    .   .
;
 
data DATA2;
 
input ID C  D;
cards;
1   5  6
2   7  8
3   9  3
4   6  6
;

proc sql;
create table want as
select coalesce(a.id,b.id) as ID, a.ID as ID1, b.ID as ID2,
	case when a.ID=. then 'Data1'
	     when b.ID=. then 'Data2'
		 else 'CHECKME' end as source
from data1 as a
full join data2 as b
on a.id=b.id;
quit;

proc print data=want;
run;
almmotamedi
Obsidian | Level 7

Thank you so much, your answered worked for the sample datasets. But when I applied the code for the real datasets, I received the error below:

 

ERROR: Expression using equals (=) has components that are of different data types.

 

for on a.id=b.id;

 

I had checked the data types and both variables are character with the same length !

 

Could you please advise?

ballardw
Super User

Show us the proc contents results for the two sets that are involved in the error.

almmotamedi
Obsidian | Level 7

I checked the results and everything is the same except in one dataset the variable has "format" and "informat", but the other does not. 

 

1.jpg

 

2.jpg

 

FreelanceReinh
Jade | Level 19

The error message from PROC SQL does not refer to the ON clause, but to the WHEN conditions where character ID's would not match the type of numeric missing values. Just replace the latter with character missings ' ' or use neutral conditions like

when a.ID is null

or

when missing(a.ID)
FreelanceReinh
Jade | Level 19

@almmotamedi wrote:

Thank you, but how using ID statement gives me observation summary? (I tried and it only gives variable summary!). Please help


This would be the case if you omitted the VAR statement.

 

My suggestion, however, was to "use the ID statement instead of the WITH statement."

proc compare base=data1 compare=data2;
var ID;
id ID;
run;

(But my first choice for this task would not be PROC COMPARE.)

 

almmotamedi
Obsidian | Level 7

So, could you please advise how you would do it, this task seems to be very easy but I am still struggling to find an answer!

FreelanceReinh
Jade | Level 19

My approach would depend on the characteristics of the two datasets and other circumstances:

 

  1. Are the datasets sorted (or indexed) by ID?
  2. Are there (possibly) duplicate ID values in one or in both datasets?
  3. If so, how should these be counted if they match?
  4. Are there missing values of ID?
  5. How large are the datasets?
  6. Do they have other variables in common except ID?
  7. Is it a quick-and-dirty count for your own purposes or part of a production-level program?
  8. Shall the number of matching ID values be written to the log, the output window, a dataset, a macro variable, ...?

 

[Edit: inserted missing word in item 7]

Ksharp
Super User
Does SQL do this job ?
proc sql;
select id from small
intersect
select id from big;
quit;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 14 replies
  • 3291 views
  • 4 likes
  • 5 in conversation