Solved: exclude records

cathy_sas · Posted 05-23-2016 11:11 PM

Dear All,

I am looking for small help .I have a dataset, where same subject fall under two groups, what i am looking is need to exclude the records where same date and same value. and fl eq 'Y' only from first group

In the below example, subject 100 falls under AAA, BBB group. for this subject , under group AAA @ W19 and W20 have same date and same value under group BBB at @ B01. so i need to exclude those two records from group BBB and keep values from AAA.

and AAA group has first record Fl eq 'Y' , and for BBB group also has first record FL eq 'Y' . Keep only the record under AAA group, exclude record with FL eq 'Y' from group BBB.

Ex:

data have;

infile datalines expandtabs truncover dlm=' ';

input pat ID :$ vis :$ VAL DT :date9. DaY FL $;

format DT :date9.;

datalines;

100 AAA B01 2205.3 . . Y

100 AAA B01 1565.1 28-Jan-15 -12

100 AAA B01 2550.9 2-Feb-15 -7

100 AAA B01 2507 9-Feb-15 1

100 AAA W03 2401.9 2-Mar-15 22

100 AAA W07 2491.6 30-Mar-15 50

100 AAA W11 1889.2 27-Apr-15 78

100 AAA W15 1349.3 27-May-15 108

100 AAA W17 598.7 8-Jun-15 120

100 AAA W18 1519 15-Jun-15 127

100 AAA W19 1816.8 22-Jun-15 134

100 AAA W20 1432.7 29-Jun-15 141

100 BBB B01 1623.0 . . Y

100 BBB B01 1816.8 22-Jun-15 -7

100 BBB B01 1432.7 29-Jun-15 1

100 BBB W03 1456.5 20-Jul-15 22

100 BBB W07 1882.4 17-Aug-15 50

100 BBB W11 900.9 14-Sep-15 78

100 BBB W15 1499.6 5-Oct-15 99

100 BBB W15 133.9 12-Oct-15 106

100 BBB W15 340.4 16-Oct-15 110

100 BBB W19 916.1 9-Nov-15 134

100 BBB W23 369 30-Nov-15 155

100 BBB W23 126.3 7-Dec-15 162

100 BBB W28 352.2 11-Jan-16 197

100 BBB W32 748.9 8-Feb-16 225

;

run;

want:

100 AAA B01 2205.3 . . Y

100 AAA B01 1565.1 28-Jan-15 -12

100 AAA B01 2550.9 2-Feb-15 -7

100 AAA B01 2507 9-Feb-15 1

100 AAA W03 2401.9 2-Mar-15 22

100 AAA W07 2491.6 30-Mar-15 50

100 AAA W11 1889.2 27-Apr-15 78

100 AAA W15 1349.3 27-May-15 108

100 AAA W17 598.7 8-Jun-15 120

100 AAA W18 1519 15-Jun-15 127

100 AAA W19 1816.8 22-Jun-15 134

100 AAA W20 1432.7 29-Jun-15 141

~~100 BBB B01 1623.0 . . Y~~

~~100 BBB B01 1816.8 22-Jun-15 -7~~

~~100 BBB B01 1432.7 29-Jun-15 1~~

100 BBB W03 1456.5 20-Jul-15 22

100 BBB W07 1882.4 17-Aug-15 50

100 BBB W11 900.9 14-Sep-15 78

100 BBB W15 1499.6 5-Oct-15 99

100 BBB W15 133.9 12-Oct-15 106

100 BBB W15 340.4 16-Oct-15 110

100 BBB W19 916.1 9-Nov-15 134

100 BBB W23 369 30-Nov-15 155

100 BBB W23 126.3 7-Dec-15 162

100 BBB W28 352.2 11-Jan-16 197

100 BBB W32 748.9 8-Feb-16 225

ChrisNZ · Posted 05-24-2016 08:35 PM

Mmm. You are right. This should work better.


data HAVE(index=(A=(PAT FL VAL DT ID)));
infile datalines expandtabs truncover dlm=' ';
input PAT ID :$   VIS :$  VAL  DT :date9.  DAY   FL $;
format DT :date9.;
datalines;
100   AAA   B01   2205.3   .            . Y
100   AAA   B01   1565.1      28-Jan-15   -12  
100   AAA   B01   2550.9      2-Feb-15    -7   
100   AAA   B01   2507  9-Feb-15    1      
100   AAA   W03   2401.9      2-Mar-15    22  
100   AAA   W07   2491.6      30-Mar-15   50   
100   AAA   W11   1889.2      27-Apr-15   78   
100   AAA   W15   1349.3      27-May-15   108     
100   AAA   W17   598.7 8-Jun-15    120 
100   AAA   W18   1519  15-Jun-15   127 
100   AAA   W19   1816.8      22-Jun-15   134 
100   AAA   W20   1432.7      29-Jun-15   141        
100   BBB   B01   1623.0   .           .  Y
100   BBB   B01   1816.8      22-Jun-15   -7     
100   BBB   B01   1432.7      29-Jun-15   1       
100   BBB   W03   1456.5      20-Jul-15   22  
100   BBB   W07   1882.4      17-Aug-15   50      
100   BBB   W11   900.9 14-Sep-15   78      
100   BBB   W15   1499.6      5-Oct-15    99      
100   BBB   W15   133.9 12-Oct-15   106     
100   BBB   W15   340.4 16-Oct-15   110     
100   BBB   W19   916.1 9-Nov-15    134        
100   BBB   W23   369       30-Nov-15     155  
100   BBB   W23   126.3 7-Dec-15    162  
100   BBB   W28   352.2 11-Jan-16   197  
100   BBB   W32   748.9 8-Feb-16    225  
run;
data WANT;
  set HAVE; 
  by PAT FL VAL DT ID;
  if ^(first.FL & last.FL) and FL='Y' and ID ne 'AAA' 
                          then delete=1; %* Delete duplicate Y;
  if ^first.DT            then delete=1; %* Delete duplicate VAL and DT;
run;

High-Performance SAS Coding - Third Edition

View solution in original post

JerryLeBreton · Posted 05-24-2016 12:04 AM

Just a PROC SORT nodupkey should do.

Ensure the data is sorted correctly...

proc sort data=have out=have2;

by id pat descending fl dt;

run;

Then sort again with NODUPKEY, and without the grouping variable, ID:

proc sort data=have2 out=want nodupkey;

by pat descending fl dt ;

run;

PROC SORT will keep the the first record of any duplicates, group AAA, and discard any duplicates i.e. BBB. I have also assumed the FL variable will be Y or N or blank, and that the DT variable will be missing when FL is Y or N.

Finally, to see it make in the original order...

proc sort ;

by pat id vis dt;

run;

If this is a big table though, and performance becomes an issue, this may not be the best approach.

cathy_sas · Posted 05-24-2016 05:43 PM

Thank you Jerry!!!

your solution work for example, but my data is big and have other lot more variables .

any looping or SQL apporoach to flag the duplicates

Thanks

ChrisNZ · Posted 05-24-2016 07:45 PM

Like this?


data HAVE(index=(A=(PAT FL VAL DT)));
infile datalines expandtabs truncover dlm=' ';
input PAT ID :$   VIS :$  VAL  DT :date9.  DAY   FL $;
format DT :date9.;
datalines;
100   AAA   B01   2205.3   .            . Y
100   AAA   B01   1565.1      28-Jan-15   -12  
100   AAA   B01   2550.9      2-Feb-15    -7   
100   AAA   B01   2507  9-Feb-15    1      
100   AAA   W03   2401.9      2-Mar-15    22  
100   AAA   W07   2491.6      30-Mar-15   50   
100   AAA   W11   1889.2      27-Apr-15   78   
100   AAA   W15   1349.3      27-May-15   108     
100   AAA   W17   598.7 8-Jun-15    120 
100   AAA   W18   1519  15-Jun-15   127 
100   AAA   W19   1816.8      22-Jun-15   134 
100   AAA   W20   1432.7      29-Jun-15   141        
100   BBB   B01   1623.0   .           .  Y
100   BBB   B01   1816.8      22-Jun-15   -7     
100   BBB   B01   1432.7      29-Jun-15   1       
100   BBB   W03   1456.5      20-Jul-15   22  
100   BBB   W07   1882.4      17-Aug-15   50      
100   BBB   W11   900.9 14-Sep-15   78      
100   BBB   W15   1499.6      5-Oct-15    99      
100   BBB   W15   133.9 12-Oct-15   106     
100   BBB   W15   340.4 16-Oct-15   110     
100   BBB   W19   916.1 9-Nov-15    134        
100   BBB   W23   369       30-Nov-15     155  
100   BBB   W23   126.3 7-Dec-15    162  
100   BBB   W28   352.2 11-Jan-16   197  
100   BBB   W32   748.9 8-Feb-16    225  
run;
data WANT;
  set HAVE; 
  by PAT FL VAL DT;
  if ^first.FL and FL='Y' then delete; %* Delete duplicate Y;
  if ^first.DT            then delete; %* Delete duplicate VAL and DT;
run;

High-Performance SAS Coding - Third Edition

cathy_sas · Posted 05-24-2016 07:57 PM

Thank you Chris !!! it is pretty much close, but i need to keep the Fl eq 'Y' from AAA grp, byt this we are excluding this record

Thanks

Cathy

ChrisNZ · Posted 05-24-2016 08:35 PM

Mmm. You are right. This should work better.


data HAVE(index=(A=(PAT FL VAL DT ID)));
infile datalines expandtabs truncover dlm=' ';
input PAT ID :$   VIS :$  VAL  DT :date9.  DAY   FL $;
format DT :date9.;
datalines;
100   AAA   B01   2205.3   .            . Y
100   AAA   B01   1565.1      28-Jan-15   -12  
100   AAA   B01   2550.9      2-Feb-15    -7   
100   AAA   B01   2507  9-Feb-15    1      
100   AAA   W03   2401.9      2-Mar-15    22  
100   AAA   W07   2491.6      30-Mar-15   50   
100   AAA   W11   1889.2      27-Apr-15   78   
100   AAA   W15   1349.3      27-May-15   108     
100   AAA   W17   598.7 8-Jun-15    120 
100   AAA   W18   1519  15-Jun-15   127 
100   AAA   W19   1816.8      22-Jun-15   134 
100   AAA   W20   1432.7      29-Jun-15   141        
100   BBB   B01   1623.0   .           .  Y
100   BBB   B01   1816.8      22-Jun-15   -7     
100   BBB   B01   1432.7      29-Jun-15   1       
100   BBB   W03   1456.5      20-Jul-15   22  
100   BBB   W07   1882.4      17-Aug-15   50      
100   BBB   W11   900.9 14-Sep-15   78      
100   BBB   W15   1499.6      5-Oct-15    99      
100   BBB   W15   133.9 12-Oct-15   106     
100   BBB   W15   340.4 16-Oct-15   110     
100   BBB   W19   916.1 9-Nov-15    134        
100   BBB   W23   369       30-Nov-15     155  
100   BBB   W23   126.3 7-Dec-15    162  
100   BBB   W28   352.2 11-Jan-16   197  
100   BBB   W32   748.9 8-Feb-16    225  
run;
data WANT;
  set HAVE; 
  by PAT FL VAL DT ID;
  if ^(first.FL & last.FL) and FL='Y' and ID ne 'AAA' 
                          then delete=1; %* Delete duplicate Y;
  if ^first.DT            then delete=1; %* Delete duplicate VAL and DT;
run;

High-Performance SAS Coding - Third Edition

cathy_sas · Posted 05-24-2016 11:34 PM

Thank you ChrisNZ!!! Works perfect!!!

Thanks

Cathy

exclude records

Re: exclude records

Re: exclude records

Re: exclude records

Re: exclude records

Re: exclude records

Re: exclude records

Re: exclude records

Classroom Training Available!