Removing a large number of observations based on a series of patient ID's

Accepted Solution Solved
Reply
New Contributor
Posts: 4
Accepted Solution

Removing a large number of observations based on a series of patient ID's

Hello Everyone, 

 

I have a large list of patient records (over 100k). After some initial analysis I have come up with a list of 1720 patient ID's that need to be removed from the list. Any suggestions as to how to go about this without writing the following 1720 times? 

 

if patient_id = (patient_id) then DELETE; 

 

Any thoughts would be much appreciated. 

 

Thanks so much

 

 


Accepted Solutions
Solution
Monday
Esteemed Advisor
Posts: 7,288

Re: Removing a large number of observations based on a series of patient ID's

Merge the two files, but only keep those IDs which aren't in both files.

 

Art, CEO, AnalystFinder.com

 

View solution in original post


All Replies
Solution
Monday
Esteemed Advisor
Posts: 7,288

Re: Removing a large number of observations based on a series of patient ID's

Merge the two files, but only keep those IDs which aren't in both files.

 

Art, CEO, AnalystFinder.com

 

New Contributor
Posts: 4

Re: Removing a large number of observations based on a series of patient ID's

This worked like a charm. 

 

Thanks so much! 

 

For anyone looking to use this solution, you can find instructions on how to merge tables while only keeping the records that don't match here:

 

http://support.sas.com/kb/40/414.html

 

Thanks again. 

 

Rightcoast.

Trusted Advisor
Posts: 1,495

Re: Removing a large number of observations based on a series of patient ID's

[ Edited ]

Art has the best idea, if the patient IDs to be removed are somehow indicated in a SAS dataset. But you didn't say that the information was in the dataset, you said you have a list ... somehow you have to either type the patient IDs as follows:

 

if patient_id in (2300091,2300392,2407702, ...) then delete; 
/* and don't make any typographical errors */

 

 

Or otherwise find a way to indicate these 1720 patient IDs

New Contributor
Posts: 4

Re: Removing a large number of observations based on a series of patient ID's

Luckily I had the patient ID's in both tables. I'll be more specific next time, but thanks so much. I'm sure this will come in handy down the line.
Grand Advisor
Posts: 17,316

Re: Removing a large number of observations based on a series of patient ID's

[ Edited ]

Use SQL NOT IN 

Assuming your full dataset is A and your partial dataset is B,

 

proc sql;
create table want as
select *
from table a
where id not in (select id from b);
quit;

righcoastmike wrote:

Hello Everyone, 

 

I have a large list of patient records (over 100k). After some initial analysis I have come up with a list of 1720 patient ID's that need to be removed from the list. Any suggestions as to how to go about this without writing the following 1720 times? 

 

if patient_id = (patient_id) then DELETE; 

 

Any thoughts would be much appreciated. 

 

Thanks so much

 

 



 

New Contributor
Posts: 4

Re: Removing a large number of observations based on a series of patient ID's

Thankfully I didn't need to go full Proc SQL on this one. Still though, really helpful for future challenges. Thanks so much!
☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 101 views
  • 0 likes
  • 4 in conversation