I'm trying to flag duplicate scan dates per location per sequence_no. If a scan date and location are entered in a later visit date, that row needs to be flagged "duplicate" see rows 5 and 6. The original data is row 4, the first visit date. I'm really having trouble with this because of all the variables, it just seems I can't get the Proc sort exactly right, that's why I'd like a flag. The flag column is what I want to add.
Any advice?
Data I want (Column flag is to be added):
row | SEQUENCE_NO_ | VISIT_DATE | scan_date | location | flag |
1 | 2 | 1/16/2018 | 1/16/2018 | a | |
2 | 2 | 4/12/2018 | 4/12/2018 | a | |
3 | 2 | 8/23/2018 | 8/23/2018 | a | |
4 | 2 | 2/26/2019 | 2/26/2019 | a | |
5 | 2 | 9/5/2019 | 2/26/2019 | a | duplicate |
6 | 2 | 3/2/2020 | 2/26/2019 | a | duplicate |
7 | 2 | 9/5/2019 | 9/3/2019 | a | |
8 | 2 | 3/2/2020 | 9/3/2019 | a | duplicate |
9 | 2 | 3/2/2020 | 2/27/2020 | a | |
10 | 3 | 7/2/2020 | 6/16/2020 | c | |
11 | 3 | 10/1/2020 | 6/16/2020 | c | duplicate |
12 | 3 | 4/1/2021 | 6/16/2020 | c | duplicate |
13 | 3 | 9/27/2021 | 6/16/2020 | c | duplicate |
14 | 3 | 7/2/2020 | 6/16/2020 | e | |
15 | 3 | 10/1/2020 | 6/16/2020 | e | duplicate |
16 | 3 | 4/1/2021 | 6/16/2020 | e | duplicate |
17 | 3 | 9/27/2021 | 6/16/2020 | e | duplicate |
18 | 3 | 7/2/2020 | 6/16/2020 | d | |
19 | 3 | 10/1/2020 | 6/16/2020 | d | duplicate |
20 | 3 | 4/1/2021 | 6/16/2020 | d | duplicate |
21 | 3 | 9/27/2021 | 6/16/2020 | d | duplicate |
22 | 3 | 3/28/2022 | 3/24/2022 | c | |
23 | 3 | 3/28/2022 | 3/24/2022 | e | |
24 | 3 | 3/28/2022 | 3/24/2022 | d | |
25 | 3 | 9/29/2022 | 9/27/2022 | c | |
26 | 3 | 9/29/2022 | 9/27/2022 | e |
How about
data have;
input SEQUENCE_NO_ (VISIT_DATE scan_date)(:mmddyy10.) location :$1.;
format VISIT_DATE scan_date mmddyy10.;
datalines;
2 1/16/2018 1/16/2018 a
2 4/12/2018 4/12/2018 a
2 8/23/2018 8/23/2018 a
2 2/26/2019 2/26/2019 a
2 9/5/2019 2/26/2019 a
2 3/2/2020 2/26/2019 a
2 9/5/2019 9/3/2019 a
2 3/2/2020 9/3/2019 a
2 3/2/2020 2/27/2020 a
3 7/2/2020 6/16/2020 c
3 10/1/2020 6/16/2020 c
3 4/1/2021 6/16/2020 c
3 9/27/2021 6/16/2020 c
3 7/2/2020 6/16/2020 e
3 10/1/2020 6/16/2020 e
3 4/1/2021 6/16/2020 e
3 9/27/2021 6/16/2020 e
3 7/2/2020 6/16/2020 d
3 10/1/2020 6/16/2020 d
3 4/1/2021 6/16/2020 d
3 9/27/2021 6/16/2020 d
3 3/28/2022 3/24/2022 c
3 3/28/2022 3/24/2022 e
3 3/28/2022 3/24/2022 d
3 9/29/2022 9/27/2022 c
3 9/29/2022 9/27/2022 e
;
data want;
set have;
by SEQUENCE_NO_ location notsorted scan_date notsorted;
if not first.scan_date then flag = 'duplicate';
run;
How about
data have;
input SEQUENCE_NO_ (VISIT_DATE scan_date)(:mmddyy10.) location :$1.;
format VISIT_DATE scan_date mmddyy10.;
datalines;
2 1/16/2018 1/16/2018 a
2 4/12/2018 4/12/2018 a
2 8/23/2018 8/23/2018 a
2 2/26/2019 2/26/2019 a
2 9/5/2019 2/26/2019 a
2 3/2/2020 2/26/2019 a
2 9/5/2019 9/3/2019 a
2 3/2/2020 9/3/2019 a
2 3/2/2020 2/27/2020 a
3 7/2/2020 6/16/2020 c
3 10/1/2020 6/16/2020 c
3 4/1/2021 6/16/2020 c
3 9/27/2021 6/16/2020 c
3 7/2/2020 6/16/2020 e
3 10/1/2020 6/16/2020 e
3 4/1/2021 6/16/2020 e
3 9/27/2021 6/16/2020 e
3 7/2/2020 6/16/2020 d
3 10/1/2020 6/16/2020 d
3 4/1/2021 6/16/2020 d
3 9/27/2021 6/16/2020 d
3 3/28/2022 3/24/2022 c
3 3/28/2022 3/24/2022 e
3 3/28/2022 3/24/2022 d
3 9/29/2022 9/27/2022 c
3 9/29/2022 9/27/2022 e
;
data want;
set have;
by SEQUENCE_NO_ location notsorted scan_date notsorted;
if not first.scan_date then flag = 'duplicate';
run;
Thanks! I think this is exactly what I wanted, never thought to use notsorted
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.