Solved: flagging duplicate rows with complex request

eawh100 · Posted 03-12-2025 09:51 AM

I'm trying to flag duplicate scan dates per location per sequence_no. If a scan date and location are entered in a later visit date, that row needs to be flagged "duplicate" see rows 5 and 6. The original data is row 4, the first visit date. I'm really having trouble with this because of all the variables, it just seems I can't get the Proc sort exactly right, that's why I'd like a flag. The flag column is what I want to add.

Any advice?

Data I want (Column flag is to be added):

row	SEQUENCE_NO_	VISIT_DATE	scan_date	location	flag
1	2	1/16/2018	1/16/2018	a
2	2	4/12/2018	4/12/2018	a
3	2	8/23/2018	8/23/2018	a
4	2	2/26/2019	2/26/2019	a
5	2	9/5/2019	2/26/2019	a	duplicate
6	2	3/2/2020	2/26/2019	a	duplicate
7	2	9/5/2019	9/3/2019	a
8	2	3/2/2020	9/3/2019	a	duplicate
9	2	3/2/2020	2/27/2020	a
10	3	7/2/2020	6/16/2020	c
11	3	10/1/2020	6/16/2020	c	duplicate
12	3	4/1/2021	6/16/2020	c	duplicate
13	3	9/27/2021	6/16/2020	c	duplicate
14	3	7/2/2020	6/16/2020	e
15	3	10/1/2020	6/16/2020	e	duplicate
16	3	4/1/2021	6/16/2020	e	duplicate
17	3	9/27/2021	6/16/2020	e	duplicate
18	3	7/2/2020	6/16/2020	d
19	3	10/1/2020	6/16/2020	d	duplicate
20	3	4/1/2021	6/16/2020	d	duplicate
21	3	9/27/2021	6/16/2020	d	duplicate
22	3	3/28/2022	3/24/2022	c
23	3	3/28/2022	3/24/2022	e
24	3	3/28/2022	3/24/2022	d
25	3	9/29/2022	9/27/2022	c
26	3	9/29/2022	9/27/2022	e

PeterClemmensen · Posted 03-12-2025 10:10 AM

How about

data have;
input SEQUENCE_NO_ (VISIT_DATE scan_date)(:mmddyy10.) location :$1.;
format VISIT_DATE scan_date mmddyy10.;
datalines;
2 1/16/2018 1/16/2018 a 
2 4/12/2018 4/12/2018 a 
2 8/23/2018 8/23/2018 a 
2 2/26/2019 2/26/2019 a 
2 9/5/2019  2/26/2019 a 
2 3/2/2020  2/26/2019 a 
2 9/5/2019  9/3/2019  a 
2 3/2/2020  9/3/2019  a 
2 3/2/2020  2/27/2020 a 
3 7/2/2020  6/16/2020 c 
3 10/1/2020 6/16/2020 c 
3 4/1/2021  6/16/2020 c 
3 9/27/2021 6/16/2020 c 
3 7/2/2020  6/16/2020 e 
3 10/1/2020 6/16/2020 e 
3 4/1/2021  6/16/2020 e 
3 9/27/2021 6/16/2020 e 
3 7/2/2020  6/16/2020 d 
3 10/1/2020 6/16/2020 d 
3 4/1/2021  6/16/2020 d 
3 9/27/2021 6/16/2020 d 
3 3/28/2022 3/24/2022 c 
3 3/28/2022 3/24/2022 e 
3 3/28/2022 3/24/2022 d 
3 9/29/2022 9/27/2022 c 
3 9/29/2022 9/27/2022 e 
;

data want;
   set have;
   by SEQUENCE_NO_ location notsorted scan_date notsorted;
   if not first.scan_date then flag = 'duplicate';
run;

The DATA to DATA Step Macro
Blog: SASnrd

View solution in original post

PeterClemmensen · Posted 03-12-2025 10:10 AM

How about

data have;
input SEQUENCE_NO_ (VISIT_DATE scan_date)(:mmddyy10.) location :$1.;
format VISIT_DATE scan_date mmddyy10.;
datalines;
2 1/16/2018 1/16/2018 a 
2 4/12/2018 4/12/2018 a 
2 8/23/2018 8/23/2018 a 
2 2/26/2019 2/26/2019 a 
2 9/5/2019  2/26/2019 a 
2 3/2/2020  2/26/2019 a 
2 9/5/2019  9/3/2019  a 
2 3/2/2020  9/3/2019  a 
2 3/2/2020  2/27/2020 a 
3 7/2/2020  6/16/2020 c 
3 10/1/2020 6/16/2020 c 
3 4/1/2021  6/16/2020 c 
3 9/27/2021 6/16/2020 c 
3 7/2/2020  6/16/2020 e 
3 10/1/2020 6/16/2020 e 
3 4/1/2021  6/16/2020 e 
3 9/27/2021 6/16/2020 e 
3 7/2/2020  6/16/2020 d 
3 10/1/2020 6/16/2020 d 
3 4/1/2021  6/16/2020 d 
3 9/27/2021 6/16/2020 d 
3 3/28/2022 3/24/2022 c 
3 3/28/2022 3/24/2022 e 
3 3/28/2022 3/24/2022 d 
3 9/29/2022 9/27/2022 c 
3 9/29/2022 9/27/2022 e 
;

data want;
   set have;
   by SEQUENCE_NO_ location notsorted scan_date notsorted;
   if not first.scan_date then flag = 'duplicate';
run;

The DATA to DATA Step Macro
Blog: SASnrd

eawh100 · Posted 03-12-2025 12:37 PM

Thanks! I think this is exactly what I wanted, never thought to use notsorted

flagging duplicate rows with complex request

Re: flagging duplicate rows with complex request

Re: flagging duplicate rows with complex request

Re: flagging duplicate rows with complex request

flagging duplicate rows with complex request

Re: flagging duplicate rows with complex request

Re: flagging duplicate rows with complex request

Re: flagging duplicate rows with complex request

Registration is open

SAS Training: Just a Click Away