BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
lmyers2
Obsidian | Level 7

Hello,

 

I'm trying to get a count of the distinct patients existing in two datasets. If I use proc sql and the patient numbers are the same in both datasets, they overwrite each other. Is there a proc sql query that scans 2 datasets and produces a number (in this case 3)?  Below are sample data including an example of output I'm looking for.

 

Current data 1

MRN

1

2

3

10

 

Current data 2

MRN

1

2

3

4

5

 

MRN_both

1

2

3

 

Best

Laura

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User
You could look at PROC COMPARE but it's a bit of an overkill.

You could also do a SQL query.

proc sql;
create table want_sql as
select distinct t2.ID from current_data1 as t1 where t1.id in (select distinct t2.ID from current_data2 t2);
quit;

You could also do a data step merge.

data want_data;
merge current_data1 (in=t1) and current_data2 (in=t2);
by ID;
if t1 and t2;
run;

The Data step approach will not work if you have duplicate IDs in either tables but the SQL one will. You could de-dup either solution after though.

View solution in original post

3 REPLIES 3
mkeintz
PROC Star

@lmyers2 wrote:(emphasis mine)


Hello,

 

I'm trying to get a count of the distinct patients existing in two datasets. If I use proc sql and the patient numbers are the same in both datasets, they overwrite each other. Is there a proc sql query that scans 2 datasets and produces a number (in this case 3)?  Below are sample data including an example of output I'm looking for.

 

Current data 1

MRN

1

2

3

10

 

Current data 2

MRN

1

2

3

4

5

 

MRN_both

1

2

3

 

Best

Laura


Why is the fact that they "overwrite each other" a problem.  This means that you will get 1 observation per common MRN, which in turns means you will have only 3 observations - the number you want

 

proc sql noprint;
  create table _null_ as  select a.mrn from 
    data1 as a 
    join 
    data2 as b 
    on a.mrn=b.mrn;
quit;
%put &=sqlobs;

This creates table _NULL_ which is not an actual physical data set, but it does cause SQL to behave as if it were.  So there will be a count of qualifying matches in macrovar SQLOBS.  If you can an actual table, change _NULL_ to a dataset or table name.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Reeza
Super User
You could look at PROC COMPARE but it's a bit of an overkill.

You could also do a SQL query.

proc sql;
create table want_sql as
select distinct t2.ID from current_data1 as t1 where t1.id in (select distinct t2.ID from current_data2 t2);
quit;

You could also do a data step merge.

data want_data;
merge current_data1 (in=t1) and current_data2 (in=t2);
by ID;
if t1 and t2;
run;

The Data step approach will not work if you have duplicate IDs in either tables but the SQL one will. You could de-dup either solution after though.
Ksharp
Super User
data data1;
input MRN;
cards;
1
2
3
10
;
 

data data2;
input MRN;
cards;
1
2
3
4
5
;

proc sql;
create table want as
select mrn from data1
intersect
select mrn from data2;
quit;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1977 views
  • 0 likes
  • 4 in conversation