Dear All,
I am looking for small help .I have a dataset, where same subject fall under two groups, what i am looking is need to exclude the records where same date and same value. and fl eq 'Y' only from first group
In the below example, subject 100 falls under AAA, BBB group. for this subject , under group AAA @ W19 and W20 have same date and same value under group BBB at @ B01. so i need to exclude those two records from group BBB and keep values from AAA.
and AAA group has first record Fl eq 'Y' , and for BBB group also has first record FL eq 'Y' . Keep only the record under AAA group, exclude record with FL eq 'Y' from group BBB.
Ex:
data have;
infile datalines expandtabs truncover dlm=' ';
input pat ID :$ vis :$ VAL DT :date9. DaY FL $;
format DT :date9.;
datalines;
100 AAA B01 2205.3 . . Y
100 AAA B01 1565.1 28-Jan-15 -12
100 AAA B01 2550.9 2-Feb-15 -7
100 AAA B01 2507 9-Feb-15 1
100 AAA W03 2401.9 2-Mar-15 22
100 AAA W07 2491.6 30-Mar-15 50
100 AAA W11 1889.2 27-Apr-15 78
100 AAA W15 1349.3 27-May-15 108
100 AAA W17 598.7 8-Jun-15 120
100 AAA W18 1519 15-Jun-15 127
100 AAA W19 1816.8 22-Jun-15 134
100 AAA W20 1432.7 29-Jun-15 141
100 BBB B01 1623.0 . . Y
100 BBB B01 1816.8 22-Jun-15 -7
100 BBB B01 1432.7 29-Jun-15 1
100 BBB W03 1456.5 20-Jul-15 22
100 BBB W07 1882.4 17-Aug-15 50
100 BBB W11 900.9 14-Sep-15 78
100 BBB W15 1499.6 5-Oct-15 99
100 BBB W15 133.9 12-Oct-15 106
100 BBB W15 340.4 16-Oct-15 110
100 BBB W19 916.1 9-Nov-15 134
100 BBB W23 369 30-Nov-15 155
100 BBB W23 126.3 7-Dec-15 162
100 BBB W28 352.2 11-Jan-16 197
100 BBB W32 748.9 8-Feb-16 225
;
run;
want:
100 AAA B01 2205.3 . . Y
100 AAA B01 1565.1 28-Jan-15 -12
100 AAA B01 2550.9 2-Feb-15 -7
100 AAA B01 2507 9-Feb-15 1
100 AAA W03 2401.9 2-Mar-15 22
100 AAA W07 2491.6 30-Mar-15 50
100 AAA W11 1889.2 27-Apr-15 78
100 AAA W15 1349.3 27-May-15 108
100 AAA W17 598.7 8-Jun-15 120
100 AAA W18 1519 15-Jun-15 127
100 AAA W19 1816.8 22-Jun-15 134
100 AAA W20 1432.7 29-Jun-15 141
100 BBB B01 1623.0 . . Y
100 BBB B01 1816.8 22-Jun-15 -7
100 BBB B01 1432.7 29-Jun-15 1
100 BBB W03 1456.5 20-Jul-15 22
100 BBB W07 1882.4 17-Aug-15 50
100 BBB W11 900.9 14-Sep-15 78
100 BBB W15 1499.6 5-Oct-15 99
100 BBB W15 133.9 12-Oct-15 106
100 BBB W15 340.4 16-Oct-15 110
100 BBB W19 916.1 9-Nov-15 134
100 BBB W23 369 30-Nov-15 155
100 BBB W23 126.3 7-Dec-15 162
100 BBB W28 352.2 11-Jan-16 197
100 BBB W32 748.9 8-Feb-16 225
Mmm. You are right. This should work better.
data HAVE(index=(A=(PAT FL VAL DT ID)));
infile datalines expandtabs truncover dlm=' ';
input PAT ID :$ VIS :$ VAL DT :date9. DAY FL $;
format DT :date9.;
datalines;
100 AAA B01 2205.3 . . Y
100 AAA B01 1565.1 28-Jan-15 -12
100 AAA B01 2550.9 2-Feb-15 -7
100 AAA B01 2507 9-Feb-15 1
100 AAA W03 2401.9 2-Mar-15 22
100 AAA W07 2491.6 30-Mar-15 50
100 AAA W11 1889.2 27-Apr-15 78
100 AAA W15 1349.3 27-May-15 108
100 AAA W17 598.7 8-Jun-15 120
100 AAA W18 1519 15-Jun-15 127
100 AAA W19 1816.8 22-Jun-15 134
100 AAA W20 1432.7 29-Jun-15 141
100 BBB B01 1623.0 . . Y
100 BBB B01 1816.8 22-Jun-15 -7
100 BBB B01 1432.7 29-Jun-15 1
100 BBB W03 1456.5 20-Jul-15 22
100 BBB W07 1882.4 17-Aug-15 50
100 BBB W11 900.9 14-Sep-15 78
100 BBB W15 1499.6 5-Oct-15 99
100 BBB W15 133.9 12-Oct-15 106
100 BBB W15 340.4 16-Oct-15 110
100 BBB W19 916.1 9-Nov-15 134
100 BBB W23 369 30-Nov-15 155
100 BBB W23 126.3 7-Dec-15 162
100 BBB W28 352.2 11-Jan-16 197
100 BBB W32 748.9 8-Feb-16 225
run;
data WANT;
set HAVE;
by PAT FL VAL DT ID;
if ^(first.FL & last.FL) and FL='Y' and ID ne 'AAA'
then delete=1; %* Delete duplicate Y;
if ^first.DT then delete=1; %* Delete duplicate VAL and DT;
run;
Just a PROC SORT nodupkey should do.
Ensure the data is sorted correctly...
proc sort data=have out=have2;
by id pat descending fl dt;
run;
Then sort again with NODUPKEY, and without the grouping variable, ID:
proc sort data=have2 out=want nodupkey;
by pat descending fl dt ;
run;
PROC SORT will keep the the first record of any duplicates, group AAA, and discard any duplicates i.e. BBB. I have also assumed the FL variable will be Y or N or blank, and that the DT variable will be missing when FL is Y or N.
Finally, to see it make in the original order...
proc sort ;
by pat id vis dt;
run;
If this is a big table though, and performance becomes an issue, this may not be the best approach.
Thank you Jerry!!!
your solution work for example, but my data is big and have other lot more variables .
any looping or SQL apporoach to flag the duplicates
Thanks
Like this?
data HAVE(index=(A=(PAT FL VAL DT)));
infile datalines expandtabs truncover dlm=' ';
input PAT ID :$ VIS :$ VAL DT :date9. DAY FL $;
format DT :date9.;
datalines;
100 AAA B01 2205.3 . . Y
100 AAA B01 1565.1 28-Jan-15 -12
100 AAA B01 2550.9 2-Feb-15 -7
100 AAA B01 2507 9-Feb-15 1
100 AAA W03 2401.9 2-Mar-15 22
100 AAA W07 2491.6 30-Mar-15 50
100 AAA W11 1889.2 27-Apr-15 78
100 AAA W15 1349.3 27-May-15 108
100 AAA W17 598.7 8-Jun-15 120
100 AAA W18 1519 15-Jun-15 127
100 AAA W19 1816.8 22-Jun-15 134
100 AAA W20 1432.7 29-Jun-15 141
100 BBB B01 1623.0 . . Y
100 BBB B01 1816.8 22-Jun-15 -7
100 BBB B01 1432.7 29-Jun-15 1
100 BBB W03 1456.5 20-Jul-15 22
100 BBB W07 1882.4 17-Aug-15 50
100 BBB W11 900.9 14-Sep-15 78
100 BBB W15 1499.6 5-Oct-15 99
100 BBB W15 133.9 12-Oct-15 106
100 BBB W15 340.4 16-Oct-15 110
100 BBB W19 916.1 9-Nov-15 134
100 BBB W23 369 30-Nov-15 155
100 BBB W23 126.3 7-Dec-15 162
100 BBB W28 352.2 11-Jan-16 197
100 BBB W32 748.9 8-Feb-16 225
run;
data WANT;
set HAVE;
by PAT FL VAL DT;
if ^first.FL and FL='Y' then delete; %* Delete duplicate Y;
if ^first.DT then delete; %* Delete duplicate VAL and DT;
run;
Thank you Chris !!! it is pretty much close, but i need to keep the Fl eq 'Y' from AAA grp, byt this we are excluding this record
Thanks
Cathy
Mmm. You are right. This should work better.
data HAVE(index=(A=(PAT FL VAL DT ID)));
infile datalines expandtabs truncover dlm=' ';
input PAT ID :$ VIS :$ VAL DT :date9. DAY FL $;
format DT :date9.;
datalines;
100 AAA B01 2205.3 . . Y
100 AAA B01 1565.1 28-Jan-15 -12
100 AAA B01 2550.9 2-Feb-15 -7
100 AAA B01 2507 9-Feb-15 1
100 AAA W03 2401.9 2-Mar-15 22
100 AAA W07 2491.6 30-Mar-15 50
100 AAA W11 1889.2 27-Apr-15 78
100 AAA W15 1349.3 27-May-15 108
100 AAA W17 598.7 8-Jun-15 120
100 AAA W18 1519 15-Jun-15 127
100 AAA W19 1816.8 22-Jun-15 134
100 AAA W20 1432.7 29-Jun-15 141
100 BBB B01 1623.0 . . Y
100 BBB B01 1816.8 22-Jun-15 -7
100 BBB B01 1432.7 29-Jun-15 1
100 BBB W03 1456.5 20-Jul-15 22
100 BBB W07 1882.4 17-Aug-15 50
100 BBB W11 900.9 14-Sep-15 78
100 BBB W15 1499.6 5-Oct-15 99
100 BBB W15 133.9 12-Oct-15 106
100 BBB W15 340.4 16-Oct-15 110
100 BBB W19 916.1 9-Nov-15 134
100 BBB W23 369 30-Nov-15 155
100 BBB W23 126.3 7-Dec-15 162
100 BBB W28 352.2 11-Jan-16 197
100 BBB W32 748.9 8-Feb-16 225
run;
data WANT;
set HAVE;
by PAT FL VAL DT ID;
if ^(first.FL & last.FL) and FL='Y' and ID ne 'AAA'
then delete=1; %* Delete duplicate Y;
if ^first.DT then delete=1; %* Delete duplicate VAL and DT;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.