Fluorite | Level 6

## Removing duplicates depending on preceeding observation

Dear all,

Could you please deliver me the solution to remove duplicates based on this set:

data a;

input obs @1 phase \$4;

cards;

1 A

2 A

3 B

4 B

5 B

6 A

7 B

8 A

9 B

;

run;

The outcome must be:

1 A

3 B

6 A

7 B

8 A

9 B

There is a multiple set of subgroups, but the outcome must be always in the order A,B,A,B,A,B, ...or B,A,B,A,B, ... where the first A and B must be selected from the subgroup.

Hope that you have a fast solution.

Best regards,

Cornelis

1 ACCEPTED SOLUTION

Accepted Solutions
Rhodochrosite | Level 12

## Re: Removing duplicates depending on preceeding observation

Oops ! Right. Sorry, answered too fast.

Data want;
set have;

lp=lag(phase);
if lp ne phase;
drop lp;
run;

7 REPLIES 7
Rhodochrosite | Level 12

## Re: Removing duplicates depending on preceeding observation

Use proc sort with option nodupkey.

proc sort data=have nodupkey;

by obs;

run;

Fluorite | Level 6

## Re: Removing duplicates depending on preceeding observation

Actually not the solution, the row numbers (obs) are all unique.

Applying nodupkey will not results into solution since the list will be remained intact.

Rhodochrosite | Level 12

## Re: Removing duplicates depending on preceeding observation

Oops ! Right. Sorry, answered too fast.

Data want;
set have;

lp=lag(phase);
if lp ne phase;
drop lp;
run;

Fluorite | Level 6

## Re: Removing duplicates depending on preceeding observation

Great, tank you for your support!

Indeed, lag function is better.

Super User

## Re: Removing duplicates depending on preceeding observation

``````data a;
input obs  phase \$;
cards;
1 A
2 A
3 B
4 B
5 B
6 A
7 B
8 A
9 B
;
run;

data want;
set a;
by phase notsorted;
if first.phase;
run;``````
Fluorite | Level 6

## Re: Removing duplicates depending on preceeding observation

Thank you, this is a good alternative.

A demonstration that there are more solution. Very good learning point!

Super User

## Re: Removing duplicates depending on preceeding observation

@Cornelis wrote:

Thank you, this is a good alternative.

A demonstration that there are more solution. Very good learning point!

AND that you should double check your data step code before posting. The example you provided generates all missing values for PHASE because you missed the period as part of the informat.

If you had supplied the better informat:

input obs @1 phase \$4.;

The values for phase would be

1 A

2 A

3 B

because it would have been reading the phase from column 1 and the informat as specified forces reading up to 4 characters.

Note that @Ksharp@'s solution does not have the @1 in the input statement.

Discussion stats
• 7 replies
• 971 views
• 3 likes
• 4 in conversation