BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Cornelis
Fluorite | Level 6

Dear all,

 

Could you please deliver me the solution to remove duplicates based on this set:

data a;

input obs @1 phase $4;

cards;

 

1 A

2 A

3 B

4 B

5 B

6 A

7 B

8 A

9 B

;

run;

 

 

The outcome must be:

1 A

3 B

6 A

7 B

8 A

9 B

 

 

There is a multiple set of subgroups, but the outcome must be always in the order A,B,A,B,A,B, ...or B,A,B,A,B, ... where the first A and B must be selected from the subgroup.

 

Hope that you have a fast solution.

 

Best regards,

 

Cornelis

1 ACCEPTED SOLUTION

Accepted Solutions
gamotte
Rhodochrosite | Level 12

Oops ! Right. Sorry, answered too fast.

 

On your sorted dataset;

 

Data want;
    set have;

    lp=lag(phase);
    if lp ne phase;
    drop lp;
run;

 

View solution in original post

7 REPLIES 7
gamotte
Rhodochrosite | Level 12

Use proc sort with option nodupkey.

 

proc sort data=have nodupkey;

by obs;

run;

Cornelis
Fluorite | Level 6

Thank you for your swift reply.

Actually not the solution, the row numbers (obs) are all unique.

Applying nodupkey will not results into solution since the list will be remained intact.

gamotte
Rhodochrosite | Level 12

Oops ! Right. Sorry, answered too fast.

 

On your sorted dataset;

 

Data want;
    set have;

    lp=lag(phase);
    if lp ne phase;
    drop lp;
run;

 

Cornelis
Fluorite | Level 6

Great, tank you for your support!

Indeed, lag function is better.

Ksharp
Super User
data a;
input obs  phase $;
cards;
1 A
2 A
3 B
4 B
5 B
6 A
7 B
8 A
9 B
;
run;

data want;
 set a;
 by phase notsorted;
 if first.phase;
run;
Cornelis
Fluorite | Level 6

Thank you, this is a good alternative.

A demonstration that there are more solution. Very good learning point!

ballardw
Super User

@Cornelis wrote:

Thank you, this is a good alternative.

A demonstration that there are more solution. Very good learning point!


AND that you should double check your data step code before posting. The example you provided generates all missing values for PHASE because you missed the period as part of the informat.

If you had supplied the better informat:

input obs @1 phase $4.;

The values for phase would be

1 A

2 A

3 B

because it would have been reading the phase from column 1 and the informat as specified forces reading up to 4 characters.

Note that @Ksharp@'s solution does not have the @1 in the input statement.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 898 views
  • 3 likes
  • 4 in conversation