Removing a large number of observations based on a series of patient ID's

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 15
Accepted Solution

Removing a large number of observations based on a series of patient ID's

Hello Everyone, 

 

I have a large list of patient records (over 100k). After some initial analysis I have come up with a list of 1720 patient ID's that need to be removed from the list. Any suggestions as to how to go about this without writing the following 1720 times? 

 

if patient_id = (patient_id) then DELETE; 

 

Any thoughts would be much appreciated. 

 

Thanks so much

 

 


Accepted Solutions
Solution
‎07-17-2017 09:56 PM
PROC Star
Posts: 7,433

Re: Removing a large number of observations based on a series of patient ID's

Merge the two files, but only keep those IDs which aren't in both files.

 

Art, CEO, AnalystFinder.com

 

View solution in original post


All Replies
Solution
‎07-17-2017 09:56 PM
PROC Star
Posts: 7,433

Re: Removing a large number of observations based on a series of patient ID's

Merge the two files, but only keep those IDs which aren't in both files.

 

Art, CEO, AnalystFinder.com

 

Occasional Contributor
Posts: 15

Re: Removing a large number of observations based on a series of patient ID's

This worked like a charm. 

 

Thanks so much! 

 

For anyone looking to use this solution, you can find instructions on how to merge tables while only keeping the records that don't match here:

 

http://support.sas.com/kb/40/414.html

 

Thanks again. 

 

Rightcoast.

Trusted Advisor
Posts: 1,789

Re: Removing a large number of observations based on a series of patient ID's

[ Edited ]

Art has the best idea, if the patient IDs to be removed are somehow indicated in a SAS dataset. But you didn't say that the information was in the dataset, you said you have a list ... somehow you have to either type the patient IDs as follows:

 

if patient_id in (2300091,2300392,2407702, ...) then delete; 
/* and don't make any typographical errors */

 

 

Or otherwise find a way to indicate these 1720 patient IDs

Occasional Contributor
Posts: 15

Re: Removing a large number of observations based on a series of patient ID's

Luckily I had the patient ID's in both tables. I'll be more specific next time, but thanks so much. I'm sure this will come in handy down the line.
Super User
Posts: 19,132

Re: Removing a large number of observations based on a series of patient ID's

[ Edited ]

Use SQL NOT IN 

Assuming your full dataset is A and your partial dataset is B,

 

proc sql;
create table want as
select *
from table a
where id not in (select id from b);
quit;

righcoastmike wrote:

Hello Everyone, 

 

I have a large list of patient records (over 100k). After some initial analysis I have come up with a list of 1720 patient ID's that need to be removed from the list. Any suggestions as to how to go about this without writing the following 1720 times? 

 

if patient_id = (patient_id) then DELETE; 

 

Any thoughts would be much appreciated. 

 

Thanks so much

 

 



 

Occasional Contributor
Posts: 15

Re: Removing a large number of observations based on a series of patient ID's

Thankfully I didn't need to go full Proc SQL on this one. Still though, really helpful for future challenges. Thanks so much!
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 136 views
  • 0 likes
  • 4 in conversation