Hello! Really looking for some help with my code. I have three data sets. One is an Excel data file that I have cleaned to only include the most pertinent data of maternal disease status (first name, last name, DOB, disease classification status, pregnancy status). I am merging this Excel file to two SAS datasets that are vital records datasets from 2018 and 2019 that contains the previous variables, in addition to their infant's information. I am trying to determine women that had the disease AND were pregnant. By merging the disease excel file with the vital records datasets, I should only receive a list of women who meet those two criteria. I decided to use fuzzy matching to identify the matches. libname vitals "path";
data births2018 (keep= SFN_NUM MOTHER_GNAME MOTHER_LNAME MOTHER_LNAME_PRIOR MDOB);
set vitals.maresbths18_02182020;
run;
data births2019 (keep= SFN_NUM MOTHER_GNAME MOTHER_LNAME MOTHER_LNAME_PRIOR MDOB);
set vitals.Maresbths19_02182020;
run;
data births2018_2019 (drop=MDOB);
length MOTHER_GNAME MOTHER_LNAME MOTHER_LNAME_PRIOR $128;
set births2018 births2019;
date = input(MDOB,mmddyy10.);
format date mmddyy10.;
run;
data allbirths;
length MOTHER_GNAME MOTHER_LNAME $128;
set births2018_2019;
MDOB=date;
VitalsDOB=date;
VitalsFN=MOTHER_GNAME;
VitalsLN=MOTHER_LNAME;
if MOTHER_GNAME=" " or MOTHER_GNAME = "---" then delete;
if MOTHER_LNAME=" " or MOTHER_LNAME = "---" then delete;
if MDOB = "." then delete;
if VitalsDOB = "." then delete;
run;
proc import out=BAB
datafile="path"
dbms=excel replace;
run;
data BAB1(keep= CaseID DISEASE_CLASSIFICATION_STATUS MOTHER_GNAME MOTHER_LNAME MDOB PREGNANCY_STATUS);
retain CaseID DISEASE_CLASSIFICATION_STATUS MOTHER_GNAME MOTHER_LNAME MDOB PREGNANCY_STATUS MDOB;
set BAB;
format MDOB mmddyy10.;
MOTHER_GNAME=COMPRESS(MOTHER_GNAME, " ", "p");
MOTHER_LNAME=COMPRESS(MOTHER_LNAME, " " , "p");
run;
proc sort data=BAB1; by MOTHER_GNAME MOTHER_LNAME MDOB;run;
proc sort data=allbirths; by MOTHER_GNAME MOTHER_LNAME date; run;
data matched;
merge allbirths (in=a) BAB1 (in=b);
by MDOB;
if a;
GNAME_COMPLEV=COMPLEV(MOTHER_GNAME, MOTHER_GNAME, ':iln');
LNAME_COMPLEV=COMPLEV(MOTHER_LNAME, MOTHER_LNAME);
TOTALCOMP=GNAME_COMPLEV+LNAME_COMPLEV;
if totalcomp>5 then delete;
run;
proc print data= matched;run; LOG: 30 data BAB1(keep= CaseID DISEASE_CLASSIFICATION_STATUS MOTHER_GNAME MOTHER_LNAME MDOB 30 ! PREGNANCY_STATUS); 31 retain CaseID DISEASE_CLASSIFICATION_STATUS MOTHER_GNAME MOTHER_LNAME MDOB PREGNANCY_STATUS 31 ! MDOB; 32 set BAB; 33 format MDOB mmddyy10.; 34 MOTHER_GNAME=COMPRESS(MOTHER_GNAME, " ", "p"); 35 MOTHER_LNAME=COMPRESS(MOTHER_LNAME, " " , "p"); 36 run; NOTE: There were 597 observations read from the data set WORK.BAB. NOTE: The data set WORK.BAB1 has 597 observations and 6 variables. NOTE: DATA statement used (Total process time): real time 0.03 seconds cpu time 0.03 seconds 37 proc sort data=BAB1; by MOTHER_GNAME MOTHER_LNAME MDOB;run; NOTE: There were 597 observations read from the data set WORK.BAB1. NOTE: The data set WORK.BAB1 has 597 observations and 6 variables. NOTE: PROCEDURE SORT used (Total process time): real time 0.03 seconds cpu time 0.03 seconds 38 proc sort data=allbirths; by MOTHER_GNAME MOTHER_LNAME date; run; NOTE: There were 137662 observations read from the data set WORK.ALLBIRTHS. NOTE: The data set WORK.ALLBIRTHS has 137662 observations and 9 variables. NOTE: PROCEDURE SORT used (Total process time): real time 0.26 seconds cpu time 0.45 seconds 39 data matched; 40 merge allbirths (in=a) BAB1 (in=b); 41 by MDOB; 42 if a; 43 GNAME_COMPLEV=COMPLEV(MOTHER_GNAME, MOTHER_GNAME, ':iln'); 44 LNAME_COMPLEV=COMPLEV(MOTHER_LNAME, MOTHER_LNAME); 45 TOTALCOMP=GNAME_COMPLEV+LNAME_COMPLEV; 46 if totalcomp>5 then delete; 47 run; ERROR: BY variables are not properly sorted on data set WORK.ALLBIRTHS. a=1 b=0 MOTHER_GNAME=A'KAYLA MOTHER_LNAME=LIPSCOMB MOTHER_LNAME_PRIOR=LIPSCOMB SFN_NUM=017274 date=02/05/2002 MDOB=02/05/2002 VitalsDOB=15376 VitalsFN=A'KAYLA VitalsLN=LIPSCOMB CaseID= DISEASE_CLASSIFICATION_STATUS= PREGNANCY_STATUS= FIRST.MDOB=1 LAST.MDOB=1 GNAME_COMPLEV=. LNAME_COMPLEV=. TOTALCOMP=. _ERROR_=1 _N_=2 NOTE: The SAS System stopped processing this step because of errors. NOTE: There were 3 observations read from the data set WORK.ALLBIRTHS. NOTE: There were 1 observations read from the data set WORK.BAB1. WARNING: The data set WORK.MATCHED may be incomplete. When this step was stopped there were 1 observations and 15 variables. NOTE: DATA statement used (Total process time): real time 0.05 seconds cpu time 0.03 seconds 48 proc print data= matched;run; NOTE: Writing HTML Body file: sashtml.htm NOTE: There were 1 observations read from the data set WORK.MATCHED. NOTE: PROCEDURE PRINT used (Total process time): real time 0.58 seconds cpu time 0.12 seconds
... View more