How to separate the non unique values

raja777pharma · Posted 09-03-2024 01:36 AM

Hi,

How to separate data that have duplicate records when by id those are more than 2 records.

example:

data test;

date id value

22Mar 1 x

22MAr 2 x

22 Mar 3 x

24 Mar 1 x

24 Mar 2 x

28 Mar 1 x

30 Sep 1 x

15 Jan 1 x

15 Jan 2 Y

15 Jan 3 x

want two data set

Duplicate by date and value ,in duplicate include single records also

in red color in one dataset those are duplicate by value and green in another data set which are not duplicate by value

Note: if only one record for the date the records also consider as duplicate and separate

Thank you,

Rajasekhar.

ChrisNZ · Posted 09-03-2024 02:14 AM

@raja777pharma wrote:

Hi,

How to separate data that have duplicate records when by id those are more than 2 records.

example:

data test;

date id value

22Mar 1 x

22MAr 2 x

22 Mar 3 x

24 Mar 1 x

24 Mar 2 x

28 Mar 1 x

30 Sep 1 x

15 Jan 1 x

15 Jan 2 Y

15 Jan 3 x

want two data set

Duplicate by date and value ,in duplicate include single records also

in red color in one dataset those are duplicate by value and green in another data set which are not duplicate by value

Note: if only one record for the date the records also consider as duplicate and separate

The requirements are unclear,

Why are the two 15JAN X observations not duplicates?

High-Performance SAS Coding - Third Edition

raja777pharma · Posted 09-04-2024 01:57 AM

Hi ,

Below values are not have same value by ID

15 Jan 1 x

15 Jan 2 Y

15 Jan 3 x

ChrisNZ · Posted 09-03-2024 02:27 AM

This marks the records for separation as you posted:

proc sql;
 select *, count(distinct VALUE) 'Count'
 from TEST 
 group by DATE;

date	id	value	OBS	Count
15JAN2024	2	Y	9	2
15JAN2024	3	x	10	2
15JAN2024	1	x	8	2
22MAR2024	1	x	1	1
22MAR2024	3	x	3	1
22MAR2024	2	x	2	1
24MAR2024	2	x	5	1
24MAR2024	1	x	4	1
28MAR2024	1	x	6	1
30SEP2024	1	x	7	1

High-Performance SAS Coding - Third Edition

Kurt_Bremser · Posted 09-03-2024 03:16 AM

It seems that one dataset should contain only the first (by date) observation for each ID, and the other dataset all other observations. Is that true?

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Sarath_A_SAS · Posted 09-04-2024 10:57 PM

data duplicates non_duplicates;
    set test;
    by date value;

    /* If the current observation is NOT both the first and last occurrence, it is a duplicate */
    if not (first.value and last.value) then output duplicates;
    else output non_duplicates;
run;

To achieve this task, you can create two datasets in SAS: one that contains duplicates based on date and value (including single records for a given date) and another dataset that contains non-duplicate records by date and value. Here's how you can do this using PROC SQL and DATA STEP approaches.

Step 1: Prepare the Sample Data

data test;
    input date :$10. id value $;
    datalines;
22Mar 1 x
22Mar 2 x
22Mar 3 x
24Mar 1 x
24Mar 2 x
28Mar 1 x
30Sep 1 x
15Jan 1 x
15Jan 2 y
15Jan 3 x
;
run;

proc sql;
create table duplicates as
select *
from test
group by date, value
having count(*) >= 1; /* this includes both duplicates and single records */
quit;

data duplicates non_duplicates;
set test;
by date value;

/* If the current observation is NOT both the first and last occurrence, it is a duplicate */
if not (first.value and last.value) then output duplicates;
else output non_duplicates;
run;

How to separate the non unique values

Re: How to separate the non unique values

Re: How to separate the non unique values

Re: How to separate the non unique values

Re: How to separate the non unique values

Re: How to separate the non unique values

How to separate the non unique values

Re: How to separate the non unique values

Re: How to separate the non unique values

Re: How to separate the non unique values

Re: How to separate the non unique values

Re: How to separate the non unique values

SAS Innovate 2025: Call for Content

Click image to register for webinar

Classroom Training Available!