Solved: Deleting ID's with Only One Observation

rangermollie · Posted 09-10-2022 01:34 PM

Hi there!

I have a data set that would be similar to the very basic example below. I'm looking to code so that any ID #'s where there is only one observation (only appear once) get deleted. So in the example below, the Student ID 3140 and its data would be deleted. The rest would be kept.

Student ID Test Score Grade

2131 A 9

2131 C 9

3140 A 10

7139 B 8

7139 C 8

7139 A 8

Thanks so much!

PaigeMiller · Posted 09-10-2022 01:39 PM

proc sql;
    create table want as select * from have
    group by studentID
    having count(studentID)>1;
quit;

--
Paige Miller

View solution in original post

PaigeMiller · Posted 09-10-2022 01:39 PM

proc sql;
    create table want as select * from have
    group by studentID
    having count(studentID)>1;
quit;

--
Paige Miller

rangermollie · Posted 09-10-2022 02:37 PM

Thank you!!!!

ballardw · Posted 09-10-2022 01:46 PM

One way

Note data step to create an actual example data set. This is how you should provided example data so we do not have to guess about properties and variable names.

When you use a BY statement in the data step SAS creates automatic variables that indicate whether the current record comes from the first or last observation of the By group. The values are numeric 1/0 which SAS treats as true and false when used in comparisons. You reference the value with First.variablename and Last.variablename (note the dot in there, no spaces allowed). So if a record is both the First and Last of group at the same time there is only one record for that By group. By group processing also want data sorted by default. In this case the sort might not be needed but your real data is likely to be more than 6 records and if all of a student id values are not together this wont' work.

data have;
 input StudentID TestScore $ Grade;
datalines;
2131  A  9
2131  C  9
3140  A  10
7139  B  8
7139  C  8
7139  A  8
;

/* make sure all records for same 
   studentid are together
*/
proc sort data=have;
   by studentid;
run;

data want;
   set have;
   by studentid;
   if first.studentid and last.studentid then delete;
run;

@rangermollie wrote:

Hi there!

I have a data set that would be similar to the very basic example below. I'm looking to code so that any ID #'s where there is only one observation (only appear once) get deleted. So in the example below, the Student ID 3140 and its data would be deleted. The rest would be kept.

Student ID Test Score Grade

2131 A 9

2131 C 9

3140 A 10

7139 B 8

7139 C 8

7139 A 8

Thanks so much!

PeterClemmensen · Posted 09-10-2022 02:18 PM

data have;
input StudentID TestScore $ Grade;
datalines;
2131 A 9 
2131 C 9 
3140 A 10
7139 B 8 
7139 C 8 
7139 A 8 
;

data want;
   set have;
   by StudentID;
   if first.StudentID and last.StudentID then delete;
run;

The DATA to DATA Step Macro
Blog: SASnrd

Deleting ID's with Only One Observation

Re: Deleting ID's with Only One Observation

Re: Deleting ID's with Only One Observation

Re: Deleting ID's with Only One Observation

Re: Deleting ID's with Only One Observation

Re: Deleting ID's with Only One Observation

Deleting ID's with Only One Observation

Re: Deleting ID's with Only One Observation

Re: Deleting ID's with Only One Observation

Re: Deleting ID's with Only One Observation

Re: Deleting ID's with Only One Observation

Re: Deleting ID's with Only One Observation

SAS Innovate 2025: Call for Content