Hi,
How to separate data that have duplicate records when by id those are more than 2 records.
example:
data test;
date id value
22Mar 1 x
22MAr 2 x
22 Mar 3 x
24 Mar 1 x
24 Mar 2 x
28 Mar 1 x
30 Sep 1 x
15 Jan 1 x
15 Jan 2 Y
15 Jan 3 x
want two data set
Duplicate by date and value ,in duplicate include single records also
in red color in one dataset those are duplicate by value and green in another data set which are not duplicate by value
Note: if only one record for the date the records also consider as duplicate and separate
Thank you,
Rajasekhar.
@raja777pharma wrote:
Hi,
How to separate data that have duplicate records when by id those are more than 2 records.
example:
data test;
date id value
22Mar 1 x
22MAr 2 x
22 Mar 3 x
24 Mar 1 x
24 Mar 2 x
28 Mar 1 x
30 Sep 1 x
15 Jan 1 x
15 Jan 2 Y
15 Jan 3 x
want two data set
Duplicate by date and value ,in duplicate include single records also
in red color in one dataset those are duplicate by value and green in another data set which are not duplicate by value
Note: if only one record for the date the records also consider as duplicate and separate
The requirements are unclear,
Why are the two 15JAN X observations not duplicates?
Hi ,
Below values are not have same value by ID
15 Jan 1 x
15 Jan 2 Y
15 Jan 3 x
This marks the records for separation as you posted:
proc sql;
select *, count(distinct VALUE) 'Count'
from TEST
group by DATE;
date | id | value | OBS | Count |
---|---|---|---|---|
15JAN2024 | 2 | Y | 9 | 2 |
15JAN2024 | 3 | x | 10 | 2 |
15JAN2024 | 1 | x | 8 | 2 |
22MAR2024 | 1 | x | 1 | 1 |
22MAR2024 | 3 | x | 3 | 1 |
22MAR2024 | 2 | x | 2 | 1 |
24MAR2024 | 2 | x | 5 | 1 |
24MAR2024 | 1 | x | 4 | 1 |
28MAR2024 | 1 | x | 6 | 1 |
30SEP2024 | 1 | x | 7 | 1 |
data duplicates non_duplicates;
set test;
by date value;
/* If the current observation is NOT both the first and last occurrence, it is a duplicate */
if not (first.value and last.value) then output duplicates;
else output non_duplicates;
run;
To achieve this task, you can create two datasets in SAS: one that contains duplicates based on date
and value
(including single records for a given date) and another dataset that contains non-duplicate records by date
and value
. Here's how you can do this using PROC SQL
and DATA STEP
approaches.
Step 1: Prepare the Sample Data
data test;
input date :$10. id value $;
datalines;
22Mar 1 x
22Mar 2 x
22Mar 3 x
24Mar 1 x
24Mar 2 x
28Mar 1 x
30Sep 1 x
15Jan 1 x
15Jan 2 y
15Jan 3 x
;
run;
proc sql;
create table duplicates as
select *
from test
group by date, value
having count(*) >= 1; /* this includes both duplicates and single records */
quit;
data duplicates non_duplicates;
set test;
by date value;
/* If the current observation is NOT both the first and last occurrence, it is a duplicate */
if not (first.value and last.value) then output duplicates;
else output non_duplicates;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.