DATA Step, Macro, Functions and more

how to remove error records

Reply
N/A
Posts: 0

how to remove error records

Hi,

I have quite a big table with lots of fields. Wanted to filter the records looking for a error condition.

Condition is there are member id's and ssn, one member can have only one ssn and the reverse is ony one ssn must be present for one member.

But the table I have contains a same SSN for multiple member id's and multiple member ssn for same same member id's.

There many records for same member and couple of records are like this.

I dont get to know how to filter these members.

Would appreciate you guys give a solution for this.

Thanks,
Super Contributor
Super Contributor
Posts: 3,174

Re: how to remove error records

Posted in reply to deleted_user
Suggest sorting your file and then using a DATA step, investigate using BY GROUP PROCESSING (see Google search below - you will be using FIRST. and/or LAST. in your code) to detect and DELETE (or OUTPUT to another separate SAS file) your "exception condition" observations. Or you may be able to use SAS PROC SORT with NODUPKEY to remove duplicates and/or again with PROC SORT use the DUPOUT= parameter to save off your duplicates (based on the BY statement variable list).

First off, I would recommend you address each of the statement requirements individually as a task and then work to integrate/combine the tasks so that your final output file(s) address the data manipulation objective.

Scott Barry
SBBWorks, Inc.

Suggested Google advanced search argument, this topic / post:

data step by group processing site:sas.com Message was edited by: sbb
Super Contributor
Super Contributor
Posts: 365

Re: how to remove error records

Posted in reply to deleted_user
Hello Activa,

This is a possible solution:
[pre]
data i;
id=1; ssn=100; output;
id=1; ssn= 90; output;
id=1; ssn= 80; output;
id=2; ssn=200; output;
id=3; ssn=200; output;
id=4; ssn=400; output;
run;
proc sort data=i out=s;
by ID;
run;
data BadID;
set s;
if NOT (First.ID=1 and Last.ID=1);
by ID;
run;
proc sort data=i out=s;
by SSN;
run;
data BadSSN;
set s;
if NOT (First.SSN=1 and Last.SSN=1);
by SSN;
run;
[/pre]
Sincerely,
SPR
Ask a Question
Discussion stats
  • 2 replies
  • 124 views
  • 0 likes
  • 3 in conversation