Hello! Really looking for some help with my code. I have three data sets. One is an Excel data file that I have cleaned to only include the most pertinent data of maternal disease status (first name, last name, DOB, disease classification status, pregnancy status). I am merging this Excel file to two SAS datasets that are vital records datasets from 2018 and 2019 that contains the previous variables, in addition to their infant's information.
I am trying to determine women that had the disease AND were pregnant. By merging the disease excel file with the vital records datasets, I should only receive a list of women who meet those two criteria.
I am getting an error where it is merging and repeating many of the same variables. My code is below.
Sample of the errors (deindentified). The unique case IDs are repeating, and the maven FN & LN are giving the same individual for different people. It's sometimes doing this for 5 or 6 people at a time, giving me the wrong case ID to look people up by.
the main log error i can find is this:
WARNING: Multiple lengths were specified for the BY variable MOTHER_GNAME by input data sets.
This might cause unexpected results.
WARNING: Multiple lengths were specified for the BY variable MOTHER_LNAME by input data sets.
This might cause unexpected results.
ERROR: BY variables are not properly sorted on data set WORK.ALLBIRTHS.
CaseID= DISEASE_CLASSIFICATION_STATUS= MOTHER_GNAME=ALEJANDRA MOTHER_LNAME=WEEKS
MDOB=03/26/1999 PREGNANCY_STATUS= MavenFN= MavenLN= MavenDOB=. SFN_NUM=031626
MOTHER_LNAME_PRIOR=BELTRAN date=03/26/1999 VitalsDOB=14329 VitalsFN=ALEJANDRA VitalsLN=WEEKS
FIRST.MOTHER_GNAME=0 LAST.MOTHER_GNAME=0 FIRST.MOTHER_LNAME=1 LAST.MOTHER_LNAME=1 FIRST.MDOB=1
LAST.MDOB=1 _ERROR_=1 _N_=1944
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 1 observations read from the data set WORK.BAB2.
NOTE: There were 1945 observations read from the data set WORK.ALLBIRTHS
case id | disease classification status | mother_gname | mother_lname | MDOB | pregnancy status | mavenFN | mavenLN | mavenDOB | sfn_numb | mother_lnameprior | date | VitalsDOB | vitalsFN | vitalsLN |
102876551 | CONFIRMED | JANE | DOE1 | 01/01/1976 | JANE | DOE3 | 6094 | 023026 | DOE1 | 01/01/1976 | 6094 | JANE | DOE1 | |
102876551 | CONFIRMED | JANE | DOE2 | 01/01/1976 | JANE | DOE3 | 6094 | 001964 | DOE2 | 01/01/1976 | 6094 | JANE | DOE2 |
libname vitals "PATH";
data births2018 (keep= SFN_NUM MOTHER_GNAME MOTHER_LNAME MOTHER_LNAME_PRIOR MDOB);
set 2018;
run;
data births2019 (keep= SFN_NUM MOTHER_GNAME MOTHER_LNAME MOTHER_LNAME_PRIOR MDOB);
set 2019;
run;
data births2018_2019 (drop=MDOB);
set births2018 births2019;
date = input(MDOB,mmddyy10.);
format date mmddyy10.;
run;
proc print data=births2018_2019 (obs=10); run;
data allbirths;
length MOTHER_GNAME MOTHER_LNAME $20;
set births2018_2019;
MDOB=date;
VitalsDOB=date;
VitalsFN=MOTHER_GNAME;
VitalsLN=MOTHER_LNAME;
run;
proc print data=allbirths (obs=10); run;
proc import out=BAB
datafile="path"
dbms=excel replace;
run;
proc print data = BAB; run;
data BAB1(keep= CaseID DISEASE_CLASSIFICATION_STATUS MOTHER_GNAME MOTHER_LNAME MDOB PREGNANCY_STATUS);
retain CaseID DISEASE_CLASSIFICATION_STATUS MOTHER_GNAME MOTHER_LNAME MDOB PREGNANCY_STATUS MDOB;
set BAB;
format MDOB mmddyy10.;
run;
proc print data = BAB1;run;
data BAB2;
set BAB1;
MavenFN=MOTHER_GNAME;
MavenLN=MOTHER_LNAME;
MavenDOB=MDOB;
run;
proc sort data=BAB2; by MOTHER_GNAME MOTHER_LNAME MDOB;run;
proc sort data=allbirths; by MOTHER_GNAME MOTHER_LNAME MDOB; run;
*Match based on First Name, Last name and DOB;
data Names;
merge BAB2 allbirths;
by MOTHER_GNAME MOTHER_LNAME MDOB;
if CaseID="" then delete;
if SFN_NUM=" " then delete;
run;
proc sort data=BAB2; by MOTHER_GNAME MOTHER_LNAME MDOB;run;
proc sort data=allbirths; by MOTHER_GNAME MOTHER_LNAME MDOB; run;
*Match based on First AND Last name;
data Names;
merge BAB2 allbirths;
by MOTHER_GNAME MOTHER_LNAME MDOB;
if CaseID="" then delete;
if SFN_NUM=" " then delete;
run;
proc sort data=BAB2; by MDOB MOTHER_GNAME;run;
proc sort data=allbirths; by MDOB MOTHER_GNAME; run;
*Match based on DOB and First Name;
data together1;
merge BAB2 allbirths;
by MDOB MOTHER_GNAME;
if CaseID="" then delete;
if SFN_NUM=" " then delete;
run;
proc sort data=BAB2; by MDOB MOTHER_LNAME;run;
proc sort data=allbirths; by MDOB MOTHER_LNAME; run;
*Match based on DOB and Last Name;
data together2;
merge BAB2 allbirths;
by MDOB MOTHER_LNAME;
if CaseID="" then delete;
if SFN_NUM=" " then delete;
run;
proc sort data=BAB2; by MDOB;run;
proc sort data=allbirths; by MDOB; run;
*Match based on DOB and Mother's Prior Last Name;
data DOB;
merge BAB2 allbirths;
by MDOB;
if MOTHER_LNAME ne MOTHER_LNAME_PRIOR then delete;
if CaseID="" then delete;
if SFN_NUM=" " then delete;
run;
data final1;
set Names together1 together2 DOB;
run;
*There are a few individuals who had multiple pregnancy outcomes*;
data final2 (Drop= MOTHER_LNAME MOTHER_GNAME MDOB SFN_NUM);
Retain CaseID Disease_Classification_Status Disease MavenFN VitalsFN MavenLN VitalsLN MOTHER_LNAME_PRIOR MavenDOB VitalsDOB MavenCity Vitalscity
SYMPTOM_ONSET_DATE Event_Date IDOB YEAR CHILD_GNAME CHILD_LNAME;
set final1;
if MavenFN ne VitalsFN then delete;
run;
proc print data = work.final1; run;
1) The posted log with the messages does not contain the submitted code.
Please run the code with options source source2 mprint;
and copy-paste it into the </> icon-window to have it idented properly.
2) whenever you have a message like:
WARNING: Multiple lengths were specified for the BY variable OTHER_GNAME by input data sets. This might cause unexpected results.
check the length of the variable in both datasets and precede the SET statement with a LENGTH statement assigning the greater length of the two.
3) The next error message should be clear enough:
ERROR: BY variables are not properly sorted on data set WORK.ALLBIRTHS.
you probably forgot to sort this dataset with same BY variables as the following MERGE step.
According to given log I can't say which MERGE step created the message.
Any procedure or data step code that uses a BY statement will require all of the data involved to be sorted BY the varaiables.
ALL the sets.
So sort WORK.ALLBIRTHS.
Multiple lengths of character variables are common from different data sources.
Easiest fix is in any data step giving you that message is to place a LENGTH statement before any SET, MERGE, UPDATE or MODIFY statement using the sets to specify the length to use. The value of the length should be at least as long as the longest defined length for the variables used. Proc contents can tell you that if you don't know how to check.
For future reference, when asking questions about error or warning messages, copy from the LOG the entire procedure or data step code that generates the message and all notes, warnings or errors. Then on the forum paste the code and messages into a code box opened with the </> icon to preserve formatting. This is important as many errors come with diagnostic characters that tell where the issue occurred but the main windows on this form will reformat text and the diagnostics aren't going to appear as they should.
Plus the code boxes set things apart.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.