Solved: Create dataset containing observations with repeated ID only

Satori · Posted 02-10-2023 09:52 AM

I have the following dataset.

data have;
input Obs ID $12. code $ revenues sales profit;
cards;
1 AE0000037163 C1 10 12 2
2 AE0000037163 C1 10 12 2
3 AZ0000129837 U2  8  3 1
4 BA0000037213 C2 10 12 2
5 BA0000037213 C2 10 12 2
6 BU0000037282 U1 12 20 5
7 BU0000037282 U1 12 20 5
8 CH0000037693 U2 13 15 3
9 CH0000037693 U2 13 15 3
10 CZ0000298373 U1 90 12 9
11 DK0000037421 C1  4 1 1
12 DK0000037421 C2  4 1 1
13 FL0000037738 C1 20 30 8
14 FL0000037738 U1 20 30 8 
15 GK0000067718 C1 2 3 1 
16 GK0000067718 U2 2 3 1 
17 HU0000097713 C2 6 2 2 
18 HU0000097713 U1 6 2 2 
19 IN0000082238 C2 90 50 40 
20 IN0000082238 U2 90 70 20 
21 IN0000082991 C1 20 12 9 
22 IT0000019073 U1 100 15 12 
23 IT0000019073 U2 8 30 8 
24 JO0000422115 C1 220 130 80 
25 JO0000422115 C2 220 130 80 
26 JO0000422115 U1 220 130 80
27 KZ0000638497 C1 . 9 . 
28 KZ0000638497 C2 14 . 11 
29 KZ0000638497 U2 10 3 7 
30 MA0000872314 U1 12 1 1
31 NB0000846281 C1 60 . 3 
32 NB0000846281 U1 60 . 4 
33 NB0000846281 U2 60 . 3 
34 PT0000546298 C2 20 30 8
35 PT0000546298 U1 20 30 8 
36 PT0000546298 U2 20 30 8 
37 RU0000976292 C1 13 9 1
38 RU0000976292 C2 12 9 2
39 RU0000976292 U1 13 8 0.5
40 RU0000976292 U2 14 2 1
41 UK0000976292 C2 33 22 91
42 US0000976292 U1 11 12 12
43 YE0000976292 C1 21 32 21
44 ZI0000976292 U2 94 19 14
;

I want to create a new dataset that keeps only the entries which have repeated ID (result desired shown below)

1 AE0000037163 C1 10 12 2
2 AE0000037163 C1 10 12 2
3 BA0000037213 C2 10 12 2
4 BA0000037213 C2 10 12 2
5 BU0000037282 U1 12 20 5
6 BU0000037282 U1 12 20 5
7 CH0000037693 U2 13 15 3
8 CH0000037693 U2 13 15 3
9 DK0000037421 C1  4 1 1
10 DK0000037421 C2  4 1 1
11 FL0000037738 C1 20 30 8
12 FL0000037738 U1 20 30 8 
13 GK0000067718 C1 2 3 1 
14 GK0000067718 U2 2 3 1 
15 HU0000097713 C2 6 2 2 
16 HU0000097713 U1 6 2 2 
17 IN0000082238 C2 90 50 40 
18 IN0000082238 U2 90 70 20 
19 IT0000019073 U1 100 15 12 
20 IT0000019073 U2 8 30 8 
21 JO0000422115 C1 220 130 80 
22 JO0000422115 C2 220 130 80 
23 JO0000422115 U1 220 130 80
24 KZ0000638497 C1 . 9 . 
25 KZ0000638497 C2 14 . 11 
26 KZ0000638497 U2 10 3 7 
27 NB0000846281 C1 60 . 3 
28 NB0000846281 U1 60 . 4 
29 NB0000846281 U2 60 . 3 
30 PT0000546298 C2 20 30 8
31 PT0000546298 U1 20 30 8 
32 PT0000546298 U2 20 30 8 
33 RU0000976292 C1 13 9 1
34 RU0000976292 C2 12 9 2
35 RU0000976292 U1 13 8 0.5
36 RU0000976292 U2 14 2 1

How can I do this?

ballardw · Posted 02-10-2023 11:47 AM

Please not that you CHANGED the values of the OBS variable in your output data set, which is a tad misleading.

Since the DATA step to use BY group processing as @Kurt_Bremser really should insure that the data is sorted first then do this with the Proc Sort step:

proc sort data=have 
   nouniquekey out=want;
   by id;
run;

The Nouniquekey option means that records with only a single observation of the By variable(s) do not go into the out= data set.

CAUTION: becareful of using this without an OUT= to a different data set otherwise you may be wondering where records went later.

View solution in original post

Kurt_Bremser · Posted 02-10-2023 10:32 AM

data want;
set have;
by id;
if not (first.id and last.id);
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

ballardw · Posted 02-10-2023 11:47 AM

Please not that you CHANGED the values of the OBS variable in your output data set, which is a tad misleading.

Since the DATA step to use BY group processing as @Kurt_Bremser really should insure that the data is sorted first then do this with the Proc Sort step:

proc sort data=have 
   nouniquekey out=want;
   by id;
run;

The Nouniquekey option means that records with only a single observation of the By variable(s) do not go into the out= data set.

CAUTION: becareful of using this without an OUT= to a different data set otherwise you may be wondering where records went later.

Create dataset containing observations with repeated ID only

Re: Create dataset containing observations with repeated ID only

Re: Create dataset containing observations with repeated ID only

Re: Create dataset containing observations with repeated ID only

Registration is open

Create dataset containing observations with repeated ID only

Re: Create dataset containing observations with repeated ID only

Re: Create dataset containing observations with repeated ID only

Re: Create dataset containing observations with repeated ID only

Registration is open

SAS Training: Just a Click Away