DATA Step, Macro, Functions and more

exclude records

Accepted Solution Solved
Reply
Contributor
Posts: 21
Accepted Solution

exclude records

Dear All,

 

I am looking for small help .I have a dataset, where same subject fall under two groups, what i am looking is need to exclude the records where same date and same value. and fl eq 'Y' only from first group

 

In the below example, subject 100 falls under AAA, BBB group. for this subject , under group AAA @ W19 and W20 have same date and same value under group BBB at @ B01. so i need to exclude those two records from group BBB and keep values from AAA.

 

and AAA group has first record Fl eq 'Y' , and for BBB group also has first record FL eq 'Y' . Keep only the record under AAA group, exclude record with FL eq 'Y' from group BBB.

 

 

Ex:

data have;

infile datalines expandtabs truncover dlm=' ';

input pat ID :$   vis :$      VAL   DT :date9.  DaY   FL $;

format DT :date9.;

datalines;

100   AAA   B01   2205.3   .            . Y

100   AAA   B01   1565.1      28-Jan-15   -12  

100   AAA   B01   2550.9      2-Feb-15    -7   

100   AAA   B01   2507  9-Feb-15    1      

100   AAA   W03   2401.9      2-Mar-15    22  

100   AAA   W07   2491.6      30-Mar-15   50   

100   AAA   W11   1889.2      27-Apr-15   78   

100   AAA   W15   1349.3      27-May-15   108     

100   AAA   W17   598.7 8-Jun-15    120 

100   AAA   W18   1519  15-Jun-15   127 

100   AAA   W19   1816.8      22-Jun-15   134 

100   AAA   W20   1432.7      29-Jun-15   141        

100   BBB   B01   1623.0   .           .  Y

100   BBB   B01   1816.8      22-Jun-15   -7     

100   BBB   B01   1432.7      29-Jun-15   1       

100   BBB   W03   1456.5      20-Jul-15   22  

100   BBB   W07   1882.4      17-Aug-15   50      

100   BBB   W11   900.9 14-Sep-15   78      

100   BBB   W15   1499.6      5-Oct-15    99      

100   BBB   W15   133.9 12-Oct-15   106     

100   BBB   W15   340.4 16-Oct-15   110     

100   BBB   W19   916.1 9-Nov-15    134        

100   BBB   W23   369       30-Nov-15     155  

100   BBB   W23   126.3 7-Dec-15    162  

100   BBB   W28   352.2 11-Jan-16   197  

100   BBB   W32   748.9 8-Feb-16    225  

;

run;

 

 

want:

 

100   AAA   B01   2205.3   .            . Y

100   AAA   B01   1565.1      28-Jan-15   -12  

100   AAA   B01   2550.9      2-Feb-15    -7   

100   AAA   B01   2507  9-Feb-15    1      

100   AAA   W03   2401.9      2-Mar-15    22  

100   AAA   W07   2491.6      30-Mar-15   50   

100   AAA   W11   1889.2      27-Apr-15   78   

100   AAA   W15   1349.3      27-May-15   108     

100   AAA   W17   598.7 8-Jun-15    120 

100   AAA   W18   1519  15-Jun-15   127 

100   AAA   W19   1816.8      22-Jun-15   134 

100   AAA   W20   1432.7      29-Jun-15   141        

100   BBB   B01   1623.0   .           .  Y

100   BBB   B01   1816.8      22-Jun-15   -7     

100   BBB   B01   1432.7      29-Jun-15   1       

100   BBB   W03   1456.5      20-Jul-15   22  

100   BBB   W07   1882.4      17-Aug-15   50      

100   BBB   W11   900.9 14-Sep-15   78      

100   BBB   W15   1499.6      5-Oct-15    99      

100   BBB   W15   133.9 12-Oct-15   106     

100   BBB   W15   340.4 16-Oct-15   110     

100   BBB   W19   916.1 9-Nov-15    134        

100   BBB   W23   369       30-Nov-15     155  

100   BBB   W23   126.3 7-Dec-15    162  

100   BBB   W28   352.2 11-Jan-16   197  

100   BBB   W32   748.9 8-Feb-16    225  


Accepted Solutions
Solution
‎05-24-2016 11:33 PM
PROC Star
Posts: 1,760

Re: exclude records

Posted in reply to cathy_sas

Mmm. You are right. This should work better.

 


data HAVE(index=(A=(PAT FL VAL DT ID)));
infile datalines expandtabs truncover dlm=' ';
input PAT ID :$   VIS :$  VAL  DT :date9.  DAY   FL $;
format DT :date9.;
datalines;
100   AAA   B01   2205.3   .            . Y
100   AAA   B01   1565.1      28-Jan-15   -12  
100   AAA   B01   2550.9      2-Feb-15    -7   
100   AAA   B01   2507  9-Feb-15    1      
100   AAA   W03   2401.9      2-Mar-15    22  
100   AAA   W07   2491.6      30-Mar-15   50   
100   AAA   W11   1889.2      27-Apr-15   78   
100   AAA   W15   1349.3      27-May-15   108     
100   AAA   W17   598.7 8-Jun-15    120 
100   AAA   W18   1519  15-Jun-15   127 
100   AAA   W19   1816.8      22-Jun-15   134 
100   AAA   W20   1432.7      29-Jun-15   141        
100   BBB   B01   1623.0   .           .  Y
100   BBB   B01   1816.8      22-Jun-15   -7     
100   BBB   B01   1432.7      29-Jun-15   1       
100   BBB   W03   1456.5      20-Jul-15   22  
100   BBB   W07   1882.4      17-Aug-15   50      
100   BBB   W11   900.9 14-Sep-15   78      
100   BBB   W15   1499.6      5-Oct-15    99      
100   BBB   W15   133.9 12-Oct-15   106     
100   BBB   W15   340.4 16-Oct-15   110     
100   BBB   W19   916.1 9-Nov-15    134        
100   BBB   W23   369       30-Nov-15     155  
100   BBB   W23   126.3 7-Dec-15    162  
100   BBB   W28   352.2 11-Jan-16   197  
100   BBB   W32   748.9 8-Feb-16    225  
run;
data WANT;
  set HAVE; 
  by PAT FL VAL DT ID;
  if ^(first.FL & last.FL) and FL='Y' and ID ne 'AAA' 
                          then delete=1; %* Delete duplicate Y;
  if ^first.DT            then delete=1; %* Delete duplicate VAL and DT;
run;

View solution in original post


All Replies
Frequent Contributor
Posts: 85

Re: exclude records

[ Edited ]
Posted in reply to cathy_sas

Just a PROC SORT  nodupkey should do.

 

Ensure the data is sorted correctly...

 

proc sort data=have out=have2;

by id pat descending fl dt;

run;

 

Then sort again with NODUPKEY, and without the grouping variable, ID:

 

proc sort data=have2 out=want nodupkey;

by pat descending fl dt ;

run;

 

PROC SORT will keep the the first record of any duplicates, group AAA, and discard any duplicates i.e. BBB.  I have also assumed the FL variable will be Y or N or blank, and that the DT variable will be missing when FL is Y or N.

 

Finally, to see it make in the original order...

 

proc sort ;

by pat id vis dt;

run;

 

If this is a big table though, and performance becomes an issue, this may not be the best approach.

 

Contributor
Posts: 21

Re: exclude records

Posted in reply to JerryLeBreton

Thank you Jerry!!!

 

your solution work for example, but my data is big and have other lot more variables .

any looping or SQL apporoach to flag the duplicates

 

 

Thanks

 

PROC Star
Posts: 1,760

Re: exclude records

Posted in reply to cathy_sas

Like this?

 


data HAVE(index=(A=(PAT FL VAL DT)));
infile datalines expandtabs truncover dlm=' ';
input PAT ID :$   VIS :$  VAL  DT :date9.  DAY   FL $;
format DT :date9.;
datalines;
100   AAA   B01   2205.3   .            . Y
100   AAA   B01   1565.1      28-Jan-15   -12  
100   AAA   B01   2550.9      2-Feb-15    -7   
100   AAA   B01   2507  9-Feb-15    1      
100   AAA   W03   2401.9      2-Mar-15    22  
100   AAA   W07   2491.6      30-Mar-15   50   
100   AAA   W11   1889.2      27-Apr-15   78   
100   AAA   W15   1349.3      27-May-15   108     
100   AAA   W17   598.7 8-Jun-15    120 
100   AAA   W18   1519  15-Jun-15   127 
100   AAA   W19   1816.8      22-Jun-15   134 
100   AAA   W20   1432.7      29-Jun-15   141        
100   BBB   B01   1623.0   .           .  Y
100   BBB   B01   1816.8      22-Jun-15   -7     
100   BBB   B01   1432.7      29-Jun-15   1       
100   BBB   W03   1456.5      20-Jul-15   22  
100   BBB   W07   1882.4      17-Aug-15   50      
100   BBB   W11   900.9 14-Sep-15   78      
100   BBB   W15   1499.6      5-Oct-15    99      
100   BBB   W15   133.9 12-Oct-15   106     
100   BBB   W15   340.4 16-Oct-15   110     
100   BBB   W19   916.1 9-Nov-15    134        
100   BBB   W23   369       30-Nov-15     155  
100   BBB   W23   126.3 7-Dec-15    162  
100   BBB   W28   352.2 11-Jan-16   197  
100   BBB   W32   748.9 8-Feb-16    225  
run;
data WANT;
  set HAVE; 
  by PAT FL VAL DT;
  if ^first.FL and FL='Y' then delete; %* Delete duplicate Y;
  if ^first.DT            then delete; %* Delete duplicate VAL and DT;
run;
Contributor
Posts: 21

Re: exclude records

Thank you Chris !!! it is pretty much close, but i need to keep the Fl eq 'Y' from AAA grp, byt this we are excluding this record

 

Thanks

Cathy

Solution
‎05-24-2016 11:33 PM
PROC Star
Posts: 1,760

Re: exclude records

Posted in reply to cathy_sas

Mmm. You are right. This should work better.

 


data HAVE(index=(A=(PAT FL VAL DT ID)));
infile datalines expandtabs truncover dlm=' ';
input PAT ID :$   VIS :$  VAL  DT :date9.  DAY   FL $;
format DT :date9.;
datalines;
100   AAA   B01   2205.3   .            . Y
100   AAA   B01   1565.1      28-Jan-15   -12  
100   AAA   B01   2550.9      2-Feb-15    -7   
100   AAA   B01   2507  9-Feb-15    1      
100   AAA   W03   2401.9      2-Mar-15    22  
100   AAA   W07   2491.6      30-Mar-15   50   
100   AAA   W11   1889.2      27-Apr-15   78   
100   AAA   W15   1349.3      27-May-15   108     
100   AAA   W17   598.7 8-Jun-15    120 
100   AAA   W18   1519  15-Jun-15   127 
100   AAA   W19   1816.8      22-Jun-15   134 
100   AAA   W20   1432.7      29-Jun-15   141        
100   BBB   B01   1623.0   .           .  Y
100   BBB   B01   1816.8      22-Jun-15   -7     
100   BBB   B01   1432.7      29-Jun-15   1       
100   BBB   W03   1456.5      20-Jul-15   22  
100   BBB   W07   1882.4      17-Aug-15   50      
100   BBB   W11   900.9 14-Sep-15   78      
100   BBB   W15   1499.6      5-Oct-15    99      
100   BBB   W15   133.9 12-Oct-15   106     
100   BBB   W15   340.4 16-Oct-15   110     
100   BBB   W19   916.1 9-Nov-15    134        
100   BBB   W23   369       30-Nov-15     155  
100   BBB   W23   126.3 7-Dec-15    162  
100   BBB   W28   352.2 11-Jan-16   197  
100   BBB   W32   748.9 8-Feb-16    225  
run;
data WANT;
  set HAVE; 
  by PAT FL VAL DT ID;
  if ^(first.FL & last.FL) and FL='Y' and ID ne 'AAA' 
                          then delete=1; %* Delete duplicate Y;
  if ^first.DT            then delete=1; %* Delete duplicate VAL and DT;
run;
Contributor
Posts: 21

Re: exclude records

Thank you ChrisNZ!!! Works perfect!!!

 

Thanks

Cathy

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 368 views
  • 0 likes
  • 3 in conversation