I have the following dataset.
data have;
input Obs ID $12. code $ revenues sales profit;
cards;
1 AE0000037163 C1 10 12 2
2 AE0000037163 C1 10 12 2
3 AZ0000129837 U2 8 3 1
4 BA0000037213 C2 10 12 2
5 BA0000037213 C2 10 12 2
6 BU0000037282 U1 12 20 5
7 BU0000037282 U1 12 20 5
8 CH0000037693 U2 13 15 3
9 CH0000037693 U2 13 15 3
10 CZ0000298373 U1 90 12 9
11 DK0000037421 C1 4 1 1
12 DK0000037421 C2 4 1 1
13 FL0000037738 C1 20 30 8
14 FL0000037738 U1 20 30 8
15 GK0000067718 C1 2 3 1
16 GK0000067718 U2 2 3 1
17 HU0000097713 C2 6 2 2
18 HU0000097713 U1 6 2 2
19 IN0000082238 C2 90 50 40
20 IN0000082238 U2 90 70 20
21 IN0000082991 C1 20 12 9
22 IT0000019073 U1 100 15 12
23 IT0000019073 U2 8 30 8
24 JO0000422115 C1 220 130 80
25 JO0000422115 C2 220 130 80
26 JO0000422115 U1 220 130 80
27 KZ0000638497 C1 . 9 .
28 KZ0000638497 C2 14 . 11
29 KZ0000638497 U2 10 3 7
30 MA0000872314 U1 12 1 1
31 NB0000846281 C1 60 . 3
32 NB0000846281 U1 60 . 4
33 NB0000846281 U2 60 . 3
34 PT0000546298 C2 20 30 8
35 PT0000546298 U1 20 30 8
36 PT0000546298 U2 20 30 8
37 RU0000976292 C1 13 9 1
38 RU0000976292 C2 12 9 2
39 RU0000976292 U1 13 8 0.5
40 RU0000976292 U2 14 2 1
41 UK0000976292 C2 33 22 91
42 US0000976292 U1 11 12 12
43 YE0000976292 C1 21 32 21
44 ZI0000976292 U2 94 19 14
;
I want to create a new dataset that keeps only the entries which have repeated ID (result desired shown below)
1 AE0000037163 C1 10 12 2
2 AE0000037163 C1 10 12 2
3 BA0000037213 C2 10 12 2
4 BA0000037213 C2 10 12 2
5 BU0000037282 U1 12 20 5
6 BU0000037282 U1 12 20 5
7 CH0000037693 U2 13 15 3
8 CH0000037693 U2 13 15 3
9 DK0000037421 C1 4 1 1
10 DK0000037421 C2 4 1 1
11 FL0000037738 C1 20 30 8
12 FL0000037738 U1 20 30 8
13 GK0000067718 C1 2 3 1
14 GK0000067718 U2 2 3 1
15 HU0000097713 C2 6 2 2
16 HU0000097713 U1 6 2 2
17 IN0000082238 C2 90 50 40
18 IN0000082238 U2 90 70 20
19 IT0000019073 U1 100 15 12
20 IT0000019073 U2 8 30 8
21 JO0000422115 C1 220 130 80
22 JO0000422115 C2 220 130 80
23 JO0000422115 U1 220 130 80
24 KZ0000638497 C1 . 9 .
25 KZ0000638497 C2 14 . 11
26 KZ0000638497 U2 10 3 7
27 NB0000846281 C1 60 . 3
28 NB0000846281 U1 60 . 4
29 NB0000846281 U2 60 . 3
30 PT0000546298 C2 20 30 8
31 PT0000546298 U1 20 30 8
32 PT0000546298 U2 20 30 8
33 RU0000976292 C1 13 9 1
34 RU0000976292 C2 12 9 2
35 RU0000976292 U1 13 8 0.5
36 RU0000976292 U2 14 2 1
How can I do this?
Please not that you CHANGED the values of the OBS variable in your output data set, which is a tad misleading.
Since the DATA step to use BY group processing as @Kurt_Bremser really should insure that the data is sorted first then do this with the Proc Sort step:
proc sort data=have nouniquekey out=want; by id; run;
The Nouniquekey option means that records with only a single observation of the By variable(s) do not go into the out= data set.
CAUTION: becareful of using this without an OUT= to a different data set otherwise you may be wondering where records went later.
data want;
set have;
by id;
if not (first.id and last.id);
run;
Please not that you CHANGED the values of the OBS variable in your output data set, which is a tad misleading.
Since the DATA step to use BY group processing as @Kurt_Bremser really should insure that the data is sorted first then do this with the Proc Sort step:
proc sort data=have nouniquekey out=want; by id; run;
The Nouniquekey option means that records with only a single observation of the By variable(s) do not go into the out= data set.
CAUTION: becareful of using this without an OUT= to a different data set otherwise you may be wondering where records went later.
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Select SAS Training centers are offering in-person courses. View upcoming courses for: