BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
cathy_sas
Calcite | Level 5

Dear All,

 

I am looking for small help .I have a dataset, where same subject fall under two groups, what i am looking is need to exclude the records where same date and same value. and fl eq 'Y' only from first group

 

In the below example, subject 100 falls under AAA, BBB group. for this subject , under group AAA @ W19 and W20 have same date and same value under group BBB at @ B01. so i need to exclude those two records from group BBB and keep values from AAA.

 

and AAA group has first record Fl eq 'Y' , and for BBB group also has first record FL eq 'Y' . Keep only the record under AAA group, exclude record with FL eq 'Y' from group BBB.

 

 

Ex:

data have;

infile datalines expandtabs truncover dlm=' ';

input pat ID :$   vis :$      VAL   DT :date9.  DaY   FL $;

format DT :date9.;

datalines;

100   AAA   B01   2205.3   .            . Y

100   AAA   B01   1565.1      28-Jan-15   -12  

100   AAA   B01   2550.9      2-Feb-15    -7   

100   AAA   B01   2507  9-Feb-15    1      

100   AAA   W03   2401.9      2-Mar-15    22  

100   AAA   W07   2491.6      30-Mar-15   50   

100   AAA   W11   1889.2      27-Apr-15   78   

100   AAA   W15   1349.3      27-May-15   108     

100   AAA   W17   598.7 8-Jun-15    120 

100   AAA   W18   1519  15-Jun-15   127 

100   AAA   W19   1816.8      22-Jun-15   134 

100   AAA   W20   1432.7      29-Jun-15   141        

100   BBB   B01   1623.0   .           .  Y

100   BBB   B01   1816.8      22-Jun-15   -7     

100   BBB   B01   1432.7      29-Jun-15   1       

100   BBB   W03   1456.5      20-Jul-15   22  

100   BBB   W07   1882.4      17-Aug-15   50      

100   BBB   W11   900.9 14-Sep-15   78      

100   BBB   W15   1499.6      5-Oct-15    99      

100   BBB   W15   133.9 12-Oct-15   106     

100   BBB   W15   340.4 16-Oct-15   110     

100   BBB   W19   916.1 9-Nov-15    134        

100   BBB   W23   369       30-Nov-15     155  

100   BBB   W23   126.3 7-Dec-15    162  

100   BBB   W28   352.2 11-Jan-16   197  

100   BBB   W32   748.9 8-Feb-16    225  

;

run;

 

 

want:

 

100   AAA   B01   2205.3   .            . Y

100   AAA   B01   1565.1      28-Jan-15   -12  

100   AAA   B01   2550.9      2-Feb-15    -7   

100   AAA   B01   2507  9-Feb-15    1      

100   AAA   W03   2401.9      2-Mar-15    22  

100   AAA   W07   2491.6      30-Mar-15   50   

100   AAA   W11   1889.2      27-Apr-15   78   

100   AAA   W15   1349.3      27-May-15   108     

100   AAA   W17   598.7 8-Jun-15    120 

100   AAA   W18   1519  15-Jun-15   127 

100   AAA   W19   1816.8      22-Jun-15   134 

100   AAA   W20   1432.7      29-Jun-15   141        

100   BBB   B01   1623.0   .           .  Y

100   BBB   B01   1816.8      22-Jun-15   -7     

100   BBB   B01   1432.7      29-Jun-15   1       

100   BBB   W03   1456.5      20-Jul-15   22  

100   BBB   W07   1882.4      17-Aug-15   50      

100   BBB   W11   900.9 14-Sep-15   78      

100   BBB   W15   1499.6      5-Oct-15    99      

100   BBB   W15   133.9 12-Oct-15   106     

100   BBB   W15   340.4 16-Oct-15   110     

100   BBB   W19   916.1 9-Nov-15    134        

100   BBB   W23   369       30-Nov-15     155  

100   BBB   W23   126.3 7-Dec-15    162  

100   BBB   W28   352.2 11-Jan-16   197  

100   BBB   W32   748.9 8-Feb-16    225  

1 ACCEPTED SOLUTION

Accepted Solutions
ChrisNZ
Tourmaline | Level 20

Mmm. You are right. This should work better.

 


data HAVE(index=(A=(PAT FL VAL DT ID)));
infile datalines expandtabs truncover dlm=' ';
input PAT ID :$   VIS :$  VAL  DT :date9.  DAY   FL $;
format DT :date9.;
datalines;
100   AAA   B01   2205.3   .            . Y
100   AAA   B01   1565.1      28-Jan-15   -12  
100   AAA   B01   2550.9      2-Feb-15    -7   
100   AAA   B01   2507  9-Feb-15    1      
100   AAA   W03   2401.9      2-Mar-15    22  
100   AAA   W07   2491.6      30-Mar-15   50   
100   AAA   W11   1889.2      27-Apr-15   78   
100   AAA   W15   1349.3      27-May-15   108     
100   AAA   W17   598.7 8-Jun-15    120 
100   AAA   W18   1519  15-Jun-15   127 
100   AAA   W19   1816.8      22-Jun-15   134 
100   AAA   W20   1432.7      29-Jun-15   141        
100   BBB   B01   1623.0   .           .  Y
100   BBB   B01   1816.8      22-Jun-15   -7     
100   BBB   B01   1432.7      29-Jun-15   1       
100   BBB   W03   1456.5      20-Jul-15   22  
100   BBB   W07   1882.4      17-Aug-15   50      
100   BBB   W11   900.9 14-Sep-15   78      
100   BBB   W15   1499.6      5-Oct-15    99      
100   BBB   W15   133.9 12-Oct-15   106     
100   BBB   W15   340.4 16-Oct-15   110     
100   BBB   W19   916.1 9-Nov-15    134        
100   BBB   W23   369       30-Nov-15     155  
100   BBB   W23   126.3 7-Dec-15    162  
100   BBB   W28   352.2 11-Jan-16   197  
100   BBB   W32   748.9 8-Feb-16    225  
run;
data WANT;
  set HAVE; 
  by PAT FL VAL DT ID;
  if ^(first.FL & last.FL) and FL='Y' and ID ne 'AAA' 
                          then delete=1; %* Delete duplicate Y;
  if ^first.DT            then delete=1; %* Delete duplicate VAL and DT;
run;

View solution in original post

6 REPLIES 6
JerryLeBreton
Pyrite | Level 9

Just a PROC SORT  nodupkey should do.

 

Ensure the data is sorted correctly...

 

proc sort data=have out=have2;

by id pat descending fl dt;

run;

 

Then sort again with NODUPKEY, and without the grouping variable, ID:

 

proc sort data=have2 out=want nodupkey;

by pat descending fl dt ;

run;

 

PROC SORT will keep the the first record of any duplicates, group AAA, and discard any duplicates i.e. BBB.  I have also assumed the FL variable will be Y or N or blank, and that the DT variable will be missing when FL is Y or N.

 

Finally, to see it make in the original order...

 

proc sort ;

by pat id vis dt;

run;

 

If this is a big table though, and performance becomes an issue, this may not be the best approach.

 

cathy_sas
Calcite | Level 5

Thank you Jerry!!!

 

your solution work for example, but my data is big and have other lot more variables .

any looping or SQL apporoach to flag the duplicates

 

 

Thanks

 

ChrisNZ
Tourmaline | Level 20

Like this?

 


data HAVE(index=(A=(PAT FL VAL DT)));
infile datalines expandtabs truncover dlm=' ';
input PAT ID :$   VIS :$  VAL  DT :date9.  DAY   FL $;
format DT :date9.;
datalines;
100   AAA   B01   2205.3   .            . Y
100   AAA   B01   1565.1      28-Jan-15   -12  
100   AAA   B01   2550.9      2-Feb-15    -7   
100   AAA   B01   2507  9-Feb-15    1      
100   AAA   W03   2401.9      2-Mar-15    22  
100   AAA   W07   2491.6      30-Mar-15   50   
100   AAA   W11   1889.2      27-Apr-15   78   
100   AAA   W15   1349.3      27-May-15   108     
100   AAA   W17   598.7 8-Jun-15    120 
100   AAA   W18   1519  15-Jun-15   127 
100   AAA   W19   1816.8      22-Jun-15   134 
100   AAA   W20   1432.7      29-Jun-15   141        
100   BBB   B01   1623.0   .           .  Y
100   BBB   B01   1816.8      22-Jun-15   -7     
100   BBB   B01   1432.7      29-Jun-15   1       
100   BBB   W03   1456.5      20-Jul-15   22  
100   BBB   W07   1882.4      17-Aug-15   50      
100   BBB   W11   900.9 14-Sep-15   78      
100   BBB   W15   1499.6      5-Oct-15    99      
100   BBB   W15   133.9 12-Oct-15   106     
100   BBB   W15   340.4 16-Oct-15   110     
100   BBB   W19   916.1 9-Nov-15    134        
100   BBB   W23   369       30-Nov-15     155  
100   BBB   W23   126.3 7-Dec-15    162  
100   BBB   W28   352.2 11-Jan-16   197  
100   BBB   W32   748.9 8-Feb-16    225  
run;
data WANT;
  set HAVE; 
  by PAT FL VAL DT;
  if ^first.FL and FL='Y' then delete; %* Delete duplicate Y;
  if ^first.DT            then delete; %* Delete duplicate VAL and DT;
run;
cathy_sas
Calcite | Level 5

Thank you Chris !!! it is pretty much close, but i need to keep the Fl eq 'Y' from AAA grp, byt this we are excluding this record

 

Thanks

Cathy

ChrisNZ
Tourmaline | Level 20

Mmm. You are right. This should work better.

 


data HAVE(index=(A=(PAT FL VAL DT ID)));
infile datalines expandtabs truncover dlm=' ';
input PAT ID :$   VIS :$  VAL  DT :date9.  DAY   FL $;
format DT :date9.;
datalines;
100   AAA   B01   2205.3   .            . Y
100   AAA   B01   1565.1      28-Jan-15   -12  
100   AAA   B01   2550.9      2-Feb-15    -7   
100   AAA   B01   2507  9-Feb-15    1      
100   AAA   W03   2401.9      2-Mar-15    22  
100   AAA   W07   2491.6      30-Mar-15   50   
100   AAA   W11   1889.2      27-Apr-15   78   
100   AAA   W15   1349.3      27-May-15   108     
100   AAA   W17   598.7 8-Jun-15    120 
100   AAA   W18   1519  15-Jun-15   127 
100   AAA   W19   1816.8      22-Jun-15   134 
100   AAA   W20   1432.7      29-Jun-15   141        
100   BBB   B01   1623.0   .           .  Y
100   BBB   B01   1816.8      22-Jun-15   -7     
100   BBB   B01   1432.7      29-Jun-15   1       
100   BBB   W03   1456.5      20-Jul-15   22  
100   BBB   W07   1882.4      17-Aug-15   50      
100   BBB   W11   900.9 14-Sep-15   78      
100   BBB   W15   1499.6      5-Oct-15    99      
100   BBB   W15   133.9 12-Oct-15   106     
100   BBB   W15   340.4 16-Oct-15   110     
100   BBB   W19   916.1 9-Nov-15    134        
100   BBB   W23   369       30-Nov-15     155  
100   BBB   W23   126.3 7-Dec-15    162  
100   BBB   W28   352.2 11-Jan-16   197  
100   BBB   W32   748.9 8-Feb-16    225  
run;
data WANT;
  set HAVE; 
  by PAT FL VAL DT ID;
  if ^(first.FL & last.FL) and FL='Y' and ID ne 'AAA' 
                          then delete=1; %* Delete duplicate Y;
  if ^first.DT            then delete=1; %* Delete duplicate VAL and DT;
run;
cathy_sas
Calcite | Level 5

Thank you ChrisNZ!!! Works perfect!!!

 

Thanks

Cathy

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1557 views
  • 0 likes
  • 3 in conversation