DATA Step, Macro, Functions and more

Need help removind duplicates

Reply
Frequent Contributor
Posts: 78

Need help removind duplicates

 ** JS020: PROCESS DATA.                                      **;       
 ***************************************************************;       
 ** LIMIT DATA TO BENEFICIARIES ON RON'S LIST.                **;       
 ***************************************************************;       
                                                                        
   PROC PRINT DATA=ENROLL.DATA                                          
    (OBS=100); TITLE 'ORIGINAL ENROLLMENT DATA'; RUN;                   
                                                                        
   ***DEFINE ABBREVIATED NAME FOR MERGE***;                             
  DATA ENROLL2;                                                         
   SET ENROLL.DATA;                                                     
       **REMOVE SPACES AND PUNCTUATION**;                               
    NAMECOMP=COMPRESS(FIRSTNAME,,'PS');                                 
   ***FIRST THREE CHARS, OR WHOLE NAME IF SHORTER**;                    
   FORMAT NAME3 $3.;                                                    
   NAME3=SUBSTR(NAMECOMP,1,MIN(3,LENGTH(NAMECOMP)));                    
   RUN;                                                                 
                                                                        
   DATA BENEB;                                                          
     SET BENES.DATA;                                                    
   **REMOVE SPACES AND PUNCTUATION**;                                   
   NAMECOMP=COMPRESS(FIRST_NAME,,'PS');                                 
   ***FIRST THREE CHARS, OR WHOLE NAME IF SHORTER**;                    
   FORMAT NAME3 $3.;                                                    
   NAME3=SUBSTR(NAMECOMP,1,MIN(3,LENGTH(NAMECOMP)));                    
   RUN;                                                                 
000136                                                                        
000137    DATA BENE2B;                                                        
000138      SET BENE2.DATA;                                                   
000139    **REMOVE SPACES AND PUNCTUATION**;                                  
000140    NAMECOMP=COMPRESS(FIRST_NAME,,'PS');                                
000141    ***FIRST THREE CHARS, OR WHOLE NAME IF SHORTER**;                   
000142    FORMAT NAME3 $3.;                                                   
000143    NAME3=SUBSTR(NAMECOMP,1,MIN(3,LENGTH(NAMECOMP)));                   
000144    RUN;                                                                
000145                                                                        
000146    ***SAME SORT ORDER, BY MERGE VARIABLES**;                           
000147    PROC SORT DATA=ENROLL2 OUT=ENROLL3 NODUPKEY;                        
000148      BY SPONSORSSN PATIENTDOB NAME3;                                   
000149    RUN;                                                                
000150                                                                        
000151   PROC SQL;                                                            
000152     CREATE TABLE ENROLL AS SELECT                                      
000153       A.*,                                                             
000154       B.BENESSN,                                                       
000155       B.EDIPN,                                                         
000156       CASE                                                             
000157         WHEN TRIM(A.NAME3) NE TRIM(B.NAME3)                            
000158               THEN B.FIRSTNAME !! " " !! B.LASTNAME                    
000159         END AS NAMEMISMATCH                                            
000160     FROM                                                               
000161       BENEB AS A LEFT JOIN ENROLL3 AS B                                
 000162          ON (A.SPONSORSSN = B.SPONSORSSN                            
 000163              AND A.PATIENTDOB = B.PATIENTDOB);                      
 000164   QUIT;                                                             
 000165                                                                     
 000166   PROC SQL;                                                         
 000167     CREATE TABLE ENROLL2 AS SELECT                                  
 000168       A.*,                                                          
 000169       B.BENESSN,                                                    
 000170       B.EDIPN,                                                      
 000171       CASE                                                          
 000172         WHEN TRIM(A.NAME3) NE TRIM(B.NAME3)                         
 000173               THEN B.FIRSTNAME !! " " !! B.LASTNAME                 
 000174         END AS NAMEMISMATCH                                         
 000175     FROM                                                            
 000176       BENE2B AS A LEFT JOIN ENROLL3 AS B                            
 000177          ON (A.SPONSORSSN = B.SPONSORSSN                            
 000178              AND A.PATIENTDOB = B.PATIENTDOB);                      
 000179   QUIT;                                                             
 000180                                                                     
 000181                                                                     
 000182   PROC PRINT DATA=ENROLL2 (OBS=200); TITLE 'ENROLL2'; RUN;          
 000183                                                                     
 000184  ***************************************************************;   
 000185  ** OUTPUT DATA                                                     
 000186  ***************************************************************;   
 000187                                                                     
000188    %INCLUDE 'FPCENRI.MACRO1.SAS(TILOUT1)'; * OUTPUT MACRO *;    
000189    %TILOUT1(ENROLL,OUT1);                                       
000190    %TILOUT1(ENROLL2,OUT2);                                      
000191                                                                 
000192  ENDSAS;                                                        
Occasional Contributor
Posts: 13

Re: Need help removind duplicates

Hi,

 

You could adapt these 2 solutions depending on your case:

 

/*1) If you want to remove only some duplicate variables:*/

PROC SORT DATA=TABLE_NAME NODUPKEY;
BY VARIABLE; /*replace by the names of the variables*/
RUN;

----------------------------------------------------------------------------------------------------------------------------------------
/*Remove the duplicates considering all the variables:*/

PROC SORT DATA = TABLE_NAME NODUPRECS;
BY _ALL_;
RUN;

 

I didn't read your code by I hope it helps! Smiley Happy

Frequent Contributor
Posts: 78

Re: Need help removind duplicates

I got help on this, thank you

Ask a Question
Discussion stats
  • 2 replies
  • 87 views
  • 0 likes
  • 2 in conversation