I'm trying to flag duplicate scan dates per location per sequence_no. If a scan date and location are entered in a later visit date, that row needs to be flagged "duplicate" see rows 5 and 6. The original data is row 4, the first visit date. I'm really having trouble with this because of all the variables, it just seems I can't get the Proc sort exactly right, that's why I'd like a flag. The flag column is what I want to add.
Any advice?
Data I want (Column flag is to be added):
row | SEQUENCE_NO_ | VISIT_DATE | scan_date | location | flag |
1 | 2 | 1/16/2018 | 1/16/2018 | a | |
2 | 2 | 4/12/2018 | 4/12/2018 | a | |
3 | 2 | 8/23/2018 | 8/23/2018 | a | |
4 | 2 | 2/26/2019 | 2/26/2019 | a | |
5 | 2 | 9/5/2019 | 2/26/2019 | a | duplicate |
6 | 2 | 3/2/2020 | 2/26/2019 | a | duplicate |
7 | 2 | 9/5/2019 | 9/3/2019 | a | |
8 | 2 | 3/2/2020 | 9/3/2019 | a | duplicate |
9 | 2 | 3/2/2020 | 2/27/2020 | a | |
10 | 3 | 7/2/2020 | 6/16/2020 | c | |
11 | 3 | 10/1/2020 | 6/16/2020 | c | duplicate |
12 | 3 | 4/1/2021 | 6/16/2020 | c | duplicate |
13 | 3 | 9/27/2021 | 6/16/2020 | c | duplicate |
14 | 3 | 7/2/2020 | 6/16/2020 | e | |
15 | 3 | 10/1/2020 | 6/16/2020 | e | duplicate |
16 | 3 | 4/1/2021 | 6/16/2020 | e | duplicate |
17 | 3 | 9/27/2021 | 6/16/2020 | e | duplicate |
18 | 3 | 7/2/2020 | 6/16/2020 | d | |
19 | 3 | 10/1/2020 | 6/16/2020 | d | duplicate |
20 | 3 | 4/1/2021 | 6/16/2020 | d | duplicate |
21 | 3 | 9/27/2021 | 6/16/2020 | d | duplicate |
22 | 3 | 3/28/2022 | 3/24/2022 | c | |
23 | 3 | 3/28/2022 | 3/24/2022 | e | |
24 | 3 | 3/28/2022 | 3/24/2022 | d | |
25 | 3 | 9/29/2022 | 9/27/2022 | c | |
26 | 3 | 9/29/2022 | 9/27/2022 | e |
How about
data have;
input SEQUENCE_NO_ (VISIT_DATE scan_date)(:mmddyy10.) location :$1.;
format VISIT_DATE scan_date mmddyy10.;
datalines;
2 1/16/2018 1/16/2018 a
2 4/12/2018 4/12/2018 a
2 8/23/2018 8/23/2018 a
2 2/26/2019 2/26/2019 a
2 9/5/2019 2/26/2019 a
2 3/2/2020 2/26/2019 a
2 9/5/2019 9/3/2019 a
2 3/2/2020 9/3/2019 a
2 3/2/2020 2/27/2020 a
3 7/2/2020 6/16/2020 c
3 10/1/2020 6/16/2020 c
3 4/1/2021 6/16/2020 c
3 9/27/2021 6/16/2020 c
3 7/2/2020 6/16/2020 e
3 10/1/2020 6/16/2020 e
3 4/1/2021 6/16/2020 e
3 9/27/2021 6/16/2020 e
3 7/2/2020 6/16/2020 d
3 10/1/2020 6/16/2020 d
3 4/1/2021 6/16/2020 d
3 9/27/2021 6/16/2020 d
3 3/28/2022 3/24/2022 c
3 3/28/2022 3/24/2022 e
3 3/28/2022 3/24/2022 d
3 9/29/2022 9/27/2022 c
3 9/29/2022 9/27/2022 e
;
data want;
set have;
by SEQUENCE_NO_ location notsorted scan_date notsorted;
if not first.scan_date then flag = 'duplicate';
run;
How about
data have;
input SEQUENCE_NO_ (VISIT_DATE scan_date)(:mmddyy10.) location :$1.;
format VISIT_DATE scan_date mmddyy10.;
datalines;
2 1/16/2018 1/16/2018 a
2 4/12/2018 4/12/2018 a
2 8/23/2018 8/23/2018 a
2 2/26/2019 2/26/2019 a
2 9/5/2019 2/26/2019 a
2 3/2/2020 2/26/2019 a
2 9/5/2019 9/3/2019 a
2 3/2/2020 9/3/2019 a
2 3/2/2020 2/27/2020 a
3 7/2/2020 6/16/2020 c
3 10/1/2020 6/16/2020 c
3 4/1/2021 6/16/2020 c
3 9/27/2021 6/16/2020 c
3 7/2/2020 6/16/2020 e
3 10/1/2020 6/16/2020 e
3 4/1/2021 6/16/2020 e
3 9/27/2021 6/16/2020 e
3 7/2/2020 6/16/2020 d
3 10/1/2020 6/16/2020 d
3 4/1/2021 6/16/2020 d
3 9/27/2021 6/16/2020 d
3 3/28/2022 3/24/2022 c
3 3/28/2022 3/24/2022 e
3 3/28/2022 3/24/2022 d
3 9/29/2022 9/27/2022 c
3 9/29/2022 9/27/2022 e
;
data want;
set have;
by SEQUENCE_NO_ location notsorted scan_date notsorted;
if not first.scan_date then flag = 'duplicate';
run;
Thanks! I think this is exactly what I wanted, never thought to use notsorted
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.