Removing a large number of observations based on a series of patient ID's

Accepted Solution Solved
Reply
Contributor
Posts: 46
Accepted Solution

Removing a large number of observations based on a series of patient ID's

Hello Everyone, 

 

I have a large list of patient records (over 100k). After some initial analysis I have come up with a list of 1720 patient ID's that need to be removed from the list. Any suggestions as to how to go about this without writing the following 1720 times? 

 

if patient_id = (patient_id) then DELETE; 

 

Any thoughts would be much appreciated. 

 

Thanks so much

 

 


Accepted Solutions
Solution
‎07-17-2017 09:56 PM
PROC Star
Posts: 8,166

Re: Removing a large number of observations based on a series of patient ID's

Posted in reply to righcoastmike

Merge the two files, but only keep those IDs which aren't in both files.

 

Art, CEO, AnalystFinder.com

 

View solution in original post


All Replies
Solution
‎07-17-2017 09:56 PM
PROC Star
Posts: 8,166

Re: Removing a large number of observations based on a series of patient ID's

Posted in reply to righcoastmike

Merge the two files, but only keep those IDs which aren't in both files.

 

Art, CEO, AnalystFinder.com

 

Contributor
Posts: 46

Re: Removing a large number of observations based on a series of patient ID's

This worked like a charm. 

 

Thanks so much! 

 

For anyone looking to use this solution, you can find instructions on how to merge tables while only keeping the records that don't match here:

 

http://support.sas.com/kb/40/414.html

 

Thanks again. 

 

Rightcoast.

Respected Advisor
Posts: 3,059

Re: Removing a large number of observations based on a series of patient ID's

[ Edited ]
Posted in reply to righcoastmike

Art has the best idea, if the patient IDs to be removed are somehow indicated in a SAS dataset. But you didn't say that the information was in the dataset, you said you have a list ... somehow you have to either type the patient IDs as follows:

 

if patient_id in (2300091,2300392,2407702, ...) then delete; 
/* and don't make any typographical errors */

 

 

Or otherwise find a way to indicate these 1720 patient IDs

--
Paige Miller
Contributor
Posts: 46

Re: Removing a large number of observations based on a series of patient ID's

Posted in reply to PaigeMiller
Luckily I had the patient ID's in both tables. I'll be more specific next time, but thanks so much. I'm sure this will come in handy down the line.
Super User
Posts: 23,773

Re: Removing a large number of observations based on a series of patient ID's

[ Edited ]
Posted in reply to righcoastmike

Use SQL NOT IN 

Assuming your full dataset is A and your partial dataset is B,

 

proc sql;
create table want as
select *
from table a
where id not in (select id from b);
quit;

righcoastmike wrote:

Hello Everyone, 

 

I have a large list of patient records (over 100k). After some initial analysis I have come up with a list of 1720 patient ID's that need to be removed from the list. Any suggestions as to how to go about this without writing the following 1720 times? 

 

if patient_id = (patient_id) then DELETE; 

 

Any thoughts would be much appreciated. 

 

Thanks so much

 

 



 

Contributor
Posts: 46

Re: Removing a large number of observations based on a series of patient ID's

Thankfully I didn't need to go full Proc SQL on this one. Still though, really helpful for future challenges. Thanks so much!
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 184 views
  • 0 likes
  • 4 in conversation