Thank you very much for the reply. I think the problem is probably the encoding while I import the data "utf-16le". The second file encoding seems to be WLatin1. I got this far by using your suggestions and SAS help. But it is taking way too long to fix. it will be great if you could help me. Please see below for the log FOR IMPORT AND LINKAE: NOTE: A byte-order mark in the file "C:\Users\-----------------------.txt" (for fileref "#LN00084") indicates that the data is encoded in "utf-16le". This encoding will be used to process the file. NOTE: The infile 'C:\Users\------------.txt' is: Filename=C:\Users\-------------------.txt, RECFM=V,LRECL=65534,File Size (bytes)=96315034, Last Modified=23Apr2021:11:05:04, Create Time=23Apr2021:10:50:37 NOTE: Invalid data for Age in line 42553 233-241. NOTE: Invalid data for LHJNumber in line 42553 327-330. NOTE: Invalid data for DtInvestigator in line 42553 454-457. RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+--- RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+--- 42553 21734345.DI..Coronavirus Disease 2019 - Non-positive ELR.E-NCOV2019NP."All-NoHIV, All-No ZONE 3333333304400467666767772467667623333222466276767676244504244453333450246624644522466246 NUMR 21734345949993F2FE16925304935135020190D0EFED0F39496505C295DE3F62019E0921CCDEF896C01CCDEF 89 TB, All-WithHIV, CD, IDB, IZB, nCoV2019, VRDL, zCoronavirus Disease 2019 - Non-positive MORE OF THESE AND FINALLY: NOTE: 95867 records were read from the infile 'C:\Users\-----------------------------.txt'. The minimum record length was 415. The maximum record length was 7259. NOTE: The data set CAL.ALLNAPA has 95867 observations and 81 variables. NOTE: DATA statement used (Total process time): real time 1.27 seconds cpu time 1.25 seconds Errors detected in submitted DATA step. Examine log. 95867 rows created in CAL.ALLNAPA from C:\Users\----------------.txt. ERROR: Import unsuccessful. See SAS Log for details. HERE IS THE LINKAGE ERROR MESSAGE: 2119 PROC SORT DATA = IRIS; 2120 BY recip_first_name recip_last_name RECIP_DOB; 2121 RUN; NOTE: There were 121741 observations read from the data set IRIS.IRIS412. NOTE: The data set IRIS has 121741 observations and 69 variables. NOTE: PROCEDURE SORT used (Total process time): real time 0.22 seconds cpu time 0.20 seconds 2122 2123 PROC SORT DATA = CAL; 2124 BY recip_last_name recip_first_name RECIP_DOB; 2125 RUN; NOTE: There were 95867 observations read from the data set CAL. NOTE: The data set CAL has 95867 observations and 82 variables. NOTE: PROCEDURE SORT used (Total process time): real time 1.69 seconds cpu time 0.78 seconds 2126 2127 data CAL.CALVAX; 2128 merge IRIS (in=a) CAL (in=b); 2129 by recip_first_name recip_last_name RECIP_DOB; 2130 IRIS = a; 2131 CAL = b; 2132 run; ERROR: BY variables are not properly sorted on data set CAL. LAST.recip_first_name=1 FIRST.Recip_last_name=1 LAST.Recip_last_name=1 FIRST.recip_dob=1 LAST.recip_dob=1 IRIS=. CAL=. _ERROR_=1 _N_=72367 ... ... NOTE: The SAS System stopped processing this step because of errors. NOTE: There were 72367 observations read from the data set IRIS. NOTE: There were 2 observations read from the data set CAL. WARNING: The data set CAL.CALVAX may be incomplete. When this step was stopped there were 72366 observations and 149 variables. WARNING: Data set CAL.CALVAX was not replaced because this step was stopped. NOTE: DATA statement used (Total process time): real time 1.01 seconds cpu time 0.25 seconds 2133 proc freq data = CAL.CALVAX; 2134 table IRIS*CAL; 2135 run; NOTE: There were 131981 observations read from the data set CAL.CALVAX. NOTE: PROCEDURE FREQ used (Total process time): real time 1.41 seconds cpu time 0.54 seconds
... View more