I have the following dataset.
data have;
input Obs ID $12. code $ revenues sales profit;
cards;
1 AE0000037163 C1 10 12 2
2 AE0000037163 C1 10 12 2
3 AZ0000129837 U2 8 3 1
4 BA0000037213 C2 10 12 2
5 BA0000037213 C2 10 12 2
6 BU0000037282 U1 12 20 5
7 BU0000037282 U1 12 20 5
8 CH0000037693 U2 13 15 3
9 CH0000037693 U2 13 15 3
10 CZ0000298373 U1 90 12 9
11 DK0000037421 C1 4 1 1
12 DK0000037421 C2 4 1 1
13 FL0000037738 C1 20 30 8
14 FL0000037738 U1 20 30 8
15 GK0000067718 C1 2 3 1
16 GK0000067718 U2 2 3 1
17 HU0000097713 C2 6 2 2
18 HU0000097713 U1 6 2 2
19 IN0000082238 C2 90 50 40
20 IN0000082238 U2 90 70 20
21 IN0000082991 C1 20 12 9
22 IT0000019073 U1 100 15 12
23 IT0000019073 U2 8 30 8
24 JO0000422115 C1 220 130 80
25 JO0000422115 C2 220 130 80
26 JO0000422115 U1 220 130 80
27 KZ0000638497 C1 . 9 .
28 KZ0000638497 C2 14 . 11
29 KZ0000638497 U2 10 3 7
30 MA0000872314 U1 12 1 1
31 NB0000846281 C1 60 . 3
32 NB0000846281 U1 60 . 4
33 NB0000846281 U2 60 . 3
34 PT0000546298 C2 20 30 8
35 PT0000546298 U1 20 30 8
36 PT0000546298 U2 20 30 8
37 RU0000976292 C1 13 9 1
38 RU0000976292 C2 12 9 2
39 RU0000976292 U1 13 8 0.5
40 RU0000976292 U2 14 2 1
41 UK0000976292 C2 33 22 91
42 US0000976292 U1 11 12 12
43 YE0000976292 C1 21 32 21
44 ZI0000976292 U2 94 19 14
;
I want to create a new dataset that keeps only the entries which have repeated ID (result desired shown below)
1 AE0000037163 C1 10 12 2
2 AE0000037163 C1 10 12 2
3 BA0000037213 C2 10 12 2
4 BA0000037213 C2 10 12 2
5 BU0000037282 U1 12 20 5
6 BU0000037282 U1 12 20 5
7 CH0000037693 U2 13 15 3
8 CH0000037693 U2 13 15 3
9 DK0000037421 C1 4 1 1
10 DK0000037421 C2 4 1 1
11 FL0000037738 C1 20 30 8
12 FL0000037738 U1 20 30 8
13 GK0000067718 C1 2 3 1
14 GK0000067718 U2 2 3 1
15 HU0000097713 C2 6 2 2
16 HU0000097713 U1 6 2 2
17 IN0000082238 C2 90 50 40
18 IN0000082238 U2 90 70 20
19 IT0000019073 U1 100 15 12
20 IT0000019073 U2 8 30 8
21 JO0000422115 C1 220 130 80
22 JO0000422115 C2 220 130 80
23 JO0000422115 U1 220 130 80
24 KZ0000638497 C1 . 9 .
25 KZ0000638497 C2 14 . 11
26 KZ0000638497 U2 10 3 7
27 NB0000846281 C1 60 . 3
28 NB0000846281 U1 60 . 4
29 NB0000846281 U2 60 . 3
30 PT0000546298 C2 20 30 8
31 PT0000546298 U1 20 30 8
32 PT0000546298 U2 20 30 8
33 RU0000976292 C1 13 9 1
34 RU0000976292 C2 12 9 2
35 RU0000976292 U1 13 8 0.5
36 RU0000976292 U2 14 2 1
How can I do this?
Please not that you CHANGED the values of the OBS variable in your output data set, which is a tad misleading.
Since the DATA step to use BY group processing as @Kurt_Bremser really should insure that the data is sorted first then do this with the Proc Sort step:
proc sort data=have nouniquekey out=want; by id; run;
The Nouniquekey option means that records with only a single observation of the By variable(s) do not go into the out= data set.
CAUTION: becareful of using this without an OUT= to a different data set otherwise you may be wondering where records went later.
data want;
set have;
by id;
if not (first.id and last.id);
run;
Please not that you CHANGED the values of the OBS variable in your output data set, which is a tad misleading.
Since the DATA step to use BY group processing as @Kurt_Bremser really should insure that the data is sorted first then do this with the Proc Sort step:
proc sort data=have nouniquekey out=want; by id; run;
The Nouniquekey option means that records with only a single observation of the By variable(s) do not go into the out= data set.
CAUTION: becareful of using this without an OUT= to a different data set otherwise you may be wondering where records went later.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.