The ERROR: Expression using equals (=) has components that are of different data types keep coming up in my log regardless of what variable is being used. Whatever is the last variable before the final quit statement shows that error. Tried multiple attempts at converting variables to character and numeric to try and match, removing variables from the join, reformatting the imported excel file, and rearranging order for the join to no avail. It's driving me crazy.
PROC IMPORT OUT=Table_1a
DATAFILE= "FilePath.XLSX"
DBMS=XLSX /*XLSX*/ REPLACE;
SHEET="Request File";
GETNAMES=YES ;
RUN;
PROC SQL;
CREATE TABLE Table_1b AS
SELECT DISTINCT A.IDrequestR as RID
, A.name_last
, A.name_first
, A.name_middle
, A.DOB
, upcase(A.race) as Race
, upcase(A.gender) as Gender
FROM work.Table_1a as A
ORDER BY A.IDrequestR
;QUIT;
PROC SQL;
CREATE TABLE Table_2 AS
SELECT DISTINCT A.*
, B.DID
FROM Work.Table_1b as A
LEFT JOIN Dev.Person_Demographics as B
ON A.Gender = B.Gender_De
AND A.DOB = B.PRBRTH_DT
AND A.name_last = B.INDVNM_LAST_NM
AND A.name_first = B.INDVNM_FRST_NM
;QUIT;
So the SAS error is correct, the date of birth variables have different types.
I don't see the current format of DOB though, which is required to accurately convert it, so what does it look like? ie 01Aug2022 or 2022/08/01...
This may work but it would be better to use the correct informat, ie date9, mmddyy than anydtdte which guesses. And can guess wrong.
PROC SQL;
CREATE TABLE Table_2 AS
SELECT DISTINCT A.*
, B.DID
FROM Work.Table_1b as A
LEFT JOIN Dev.Person_Demographics as B
ON A.Gender = B.Gender_De
AND input(A.DOB, anydtdte.) = B.PRBRTH_DT
AND A.name_last = B.INDVNM_LAST_NM
AND A.name_first = B.INDVNM_FRST_NM
;QUIT;
@cbagdon-cox wrote:
Table 1a and Table 1c are using the same variables. The problem is with the join between table 1c variables and the dev.person_demographics table. I put the proc contents for those below.
Variables in Creation Order# Variable Type Len Format Informat Label1234567
RID Num 8 BEST. RID name_last Char 17 $17. $17. name_last name_first Char 15 $15. $15. name_first name_middle Char 15 $15. $15. name_middle DOB Char 9 Race Char 38 Gender Char 7
Variables in Creation Order
#
Variable
Type
Len
Format
Informat
Label
1
DID
Num
8
11.
11.
DID
9
GENDER_DE
Char
7
$7.
$7.
GENDER_DE
11
RACE_DE
Char
30
$30.
$30.
RACE_DE
13
PRBRTH_DT
Num
8
DATE9.
DATE9.
PRBRTH_DT
26
INDVNM_FRST_NM
Char
30
$30.
$30.
INDVNM_FRST_NM
27
INDVNM_MDL_NM
Char
20
$20.
$20.
INDVNM_MDL_NM
28
INDVNM_LAST_NM
Char
50
$50.
$50.
INDVNM_LAST_NM
Run a proc contents on table_1a and table_1b and look for the differences between the Gender and Date of Birth columns. I'm guessing one of those is character and one is numeric with a date format. Post that information here, with how the variable looks for each (ie 2022/08/11 or 11Aug2022) and we can help you fix the code.
@cbagdon-cox wrote:
The ERROR: Expression using equals (=) has components that are of different data types keep coming up in my log regardless of what variable is being used. Whatever is the last variable before the final quit statement shows that error. Tried multiple attempts at converting variables to character and numeric to try and match, removing variables from the join, reformatting the imported excel file, and rearranging order for the join to no avail. It's driving me crazy.
PROC IMPORT OUT=Table_1a
DATAFILE= "FilePath.XLSX"
DBMS=XLSX /*XLSX*/ REPLACE;
SHEET="Request File";
GETNAMES=YES ;
RUN;
PROC SQL;
CREATE TABLE Table_1b AS
SELECT DISTINCT A.IDrequestR as RID
, A.name_last
, A.name_first
, A.name_middle
, A.DOB
, upcase(A.race) as Race
, upcase(A.gender) as Gender
FROM work.Table_1a as A
ORDER BY A.IDrequestR
;QUIT;
PROC SQL;
CREATE TABLE Table_2 AS
SELECT DISTINCT A.*
, B.DID
FROM Work.Table_1b as A
LEFT JOIN Dev.Person_Demographics as B
ON A.Gender = B.Gender_De
AND A.DOB = B.PRBRTH_DT
AND A.name_last = B.INDVNM_LAST_NM
AND A.name_first = B.INDVNM_FRST_NM
;QUIT;
Table 1a and Table 1c are using the same variables. The problem is with the join between table 1c variables and the dev.person_demographics table. I put the proc contents for those below.
RID | Num | 8 | BEST. | RID | |
name_last | Char | 17 | $17. | $17. | name_last |
name_first | Char | 15 | $15. | $15. | name_first |
name_middle | Char | 15 | $15. | $15. | name_middle |
DOB | Char | 9 | |||
Race | Char | 38 | |||
Gender | Char | 7 |
Variables in Creation Order | ||||||
# | Variable | Type | Len | Format | Informat | Label |
1 | DID | Num | 8 | 11. | 11. | DID |
9 | GENDER_DE | Char | 7 | $7. | $7. | GENDER_DE |
11 | RACE_DE | Char | 30 | $30. | $30. | RACE_DE |
13 | PRBRTH_DT | Num | 8 | DATE9. | DATE9. | PRBRTH_DT |
26 | INDVNM_FRST_NM | Char | 30 | $30. | $30. | INDVNM_FRST_NM |
27 | INDVNM_MDL_NM | Char | 20 | $20. | $20. | INDVNM_MDL_NM |
28 | INDVNM_LAST_NM | Char | 50 | $50. | $50. | INDVNM_LAST_NM |
So the SAS error is correct, the date of birth variables have different types.
I don't see the current format of DOB though, which is required to accurately convert it, so what does it look like? ie 01Aug2022 or 2022/08/01...
This may work but it would be better to use the correct informat, ie date9, mmddyy than anydtdte which guesses. And can guess wrong.
PROC SQL;
CREATE TABLE Table_2 AS
SELECT DISTINCT A.*
, B.DID
FROM Work.Table_1b as A
LEFT JOIN Dev.Person_Demographics as B
ON A.Gender = B.Gender_De
AND input(A.DOB, anydtdte.) = B.PRBRTH_DT
AND A.name_last = B.INDVNM_LAST_NM
AND A.name_first = B.INDVNM_FRST_NM
;QUIT;
@cbagdon-cox wrote:
Table 1a and Table 1c are using the same variables. The problem is with the join between table 1c variables and the dev.person_demographics table. I put the proc contents for those below.
Variables in Creation Order# Variable Type Len Format Informat Label1234567
RID Num 8 BEST. RID name_last Char 17 $17. $17. name_last name_first Char 15 $15. $15. name_first name_middle Char 15 $15. $15. name_middle DOB Char 9 Race Char 38 Gender Char 7
Variables in Creation Order
#
Variable
Type
Len
Format
Informat
Label
1
DID
Num
8
11.
11.
DID
9
GENDER_DE
Char
7
$7.
$7.
GENDER_DE
11
RACE_DE
Char
30
$30.
$30.
RACE_DE
13
PRBRTH_DT
Num
8
DATE9.
DATE9.
PRBRTH_DT
26
INDVNM_FRST_NM
Char
30
$30.
$30.
INDVNM_FRST_NM
27
INDVNM_MDL_NM
Char
20
$20.
$20.
INDVNM_MDL_NM
28
INDVNM_LAST_NM
Char
50
$50.
$50.
INDVNM_LAST_NM
It worked! Why it wouldn't allow me to change the format of DOB in the table_1c code {used input(A.DOB,$date9.) as DOB} refused to work but running it in the join worked is beyond me. Thanks!
You have to change it on the join as well as in the SELECT code. The informat would be date9, not $date9. as well.
If your informat is date9, it's better to use that than to use anydtdte.
using date9. would through this error: ERROR: INPUT function requires a character argument.
I'm guessing without the actual code, but the SAS log isn't usually wrong.
@cbagdon-cox wrote:
I'm still baffled as to why the log wasn't identifying DOB as the problem variable and instead was showing the error for whatever was the last variable before the quit statement, not very helpful.
One has a strong suspicion that the SAS developers sort of gave up on deciding where to attempt to place an error indicator since the places using an = might have multiple functions with multiple variables, sub-query results and such. So the error comes at the end of the whole statement , not 'the last variable'.
See this example where a text version of age was added to a copy of SASHELP.CLASS. The error comes neatly tucked in just after the semicolon that ended the create table selection and before the quit. If you had submitted multiple Create table or simple selections within a single Proc SQL call the error message does identify which specific query caused the problem
150 proc sql; 151 create table sqlexample as 152 select a.*, b.agetext 153 , (a.age=b.agetext) as compared 154 from sashelp.class as a 155 left join 156 work.example as b 157 on a.name=b.name 158 ; ERROR: Expression using equals (=) has components that are of different data types. 159 quit;
So DOB is a character string and PRBRTH_DT is the number of days since 1960.
Look at the string in DOB and see if they follow the pattern that the DATE. informat can understand. If they do then use the INPUT() function in your ON criteria.
input(dob,date9.) = prbirth_dt
This did not work and gave the same error.
@cbagdon-cox wrote:
PROC IMPORT OUT=Table_1a
DATAFILE= "FilePath.XLSX"
DBMS=XLSX /*XLSX*/ REPLACE;
SHEET="Request File";
GETNAMES=YES ;
RUN;
Highlighted text is probably the most common cause of that particular error.
Proc Import uses external rules to "guess" as to the variable types and attributes because spreadsheet software does not impose any constraints. In a single column of a spread sheet you can have text (of multiple lengths), numeric values, dates and/or times. So SAS, which does have constraints on VARIABLES has to pick one type, numeric or character for all values in that COLUMN, and assign characteristics like formats. Because the rules are based on the contents of the columns if one sheet that is imported has character data at the top it will get treated as character, if the values only hold digits then the value becomes numeric. So two different spreadsheets that supposedly have similar values get treated differently by Proc Import based on the order of the values.
You don't even mention a guess or describe which variable(s) comparisons are involved. My guess is that the DOB and the PRBIRTH_dt are the most likely to be involved. Possibly because some , or all, of the dates involved in one of the sets had some values entered as text and the others used the underlying Excel numeric values with a display set to look similar to the text of the other.
The general advice is to ALWAYS check the result of Proc Import for variable types, especially ones that you intend to compare with others or to combine into a single data set.
The solution if you are going to be dealing with multiple files that have the supposed same structure is to not use Proc Import. Instead save the file to CSV and write a data step to read the text file and set the exact same properties for every file read.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.