Dear all,
Could you please deliver me the solution to remove duplicates based on this set:
data a;
input obs @1 phase $4;
cards;
1 A
2 A
3 B
4 B
5 B
6 A
7 B
8 A
9 B
;
run;
The outcome must be:
1 A
3 B
6 A
7 B
8 A
9 B
There is a multiple set of subgroups, but the outcome must be always in the order A,B,A,B,A,B, ...or B,A,B,A,B, ... where the first A and B must be selected from the subgroup.
Hope that you have a fast solution.
Best regards,
Cornelis
Oops ! Right. Sorry, answered too fast.
On your sorted dataset;
Data want;
set have;
lp=lag(phase);
if lp ne phase;
drop lp;
run;
Use proc sort with option nodupkey.
proc sort data=have nodupkey;
by obs;
run;
Thank you for your swift reply.
Actually not the solution, the row numbers (obs) are all unique.
Applying nodupkey will not results into solution since the list will be remained intact.
Oops ! Right. Sorry, answered too fast.
On your sorted dataset;
Data want;
set have;
lp=lag(phase);
if lp ne phase;
drop lp;
run;
Great, tank you for your support!
Indeed, lag function is better.
data a;
input obs phase $;
cards;
1 A
2 A
3 B
4 B
5 B
6 A
7 B
8 A
9 B
;
run;
data want;
set a;
by phase notsorted;
if first.phase;
run;
Thank you, this is a good alternative.
A demonstration that there are more solution. Very good learning point!
@Cornelis wrote:
Thank you, this is a good alternative.
A demonstration that there are more solution. Very good learning point!
AND that you should double check your data step code before posting. The example you provided generates all missing values for PHASE because you missed the period as part of the informat.
If you had supplied the better informat:
input obs @1 phase $4.;
The values for phase would be
1 A
2 A
3 B
because it would have been reading the phase from column 1 and the informat as specified forces reading up to 4 characters.
Note that @Ksharp@'s solution does not have the @1 in the input statement.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.