BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
cbagdon-cox
Obsidian | Level 7

The ERROR: Expression using equals (=) has components that are of different data types keep coming up in my log regardless of what variable is being used. Whatever is the last variable before the final quit statement shows that error. Tried multiple attempts at converting variables to character and numeric to try and match, removing variables from the join, reformatting the imported excel file, and rearranging order for the join to no avail. It's driving me crazy.

 

PROC IMPORT OUT=Table_1a
DATAFILE= "FilePath.XLSX"
DBMS=XLSX /*XLSX*/ REPLACE;
SHEET="Request File";
GETNAMES=YES ;
RUN;

 

PROC SQL;
CREATE TABLE Table_1b AS
SELECT DISTINCT A.IDrequestR as RID
, A.name_last
, A.name_first
, A.name_middle
, A.DOB
, upcase(A.race) as Race
, upcase(A.gender) as Gender
FROM work.Table_1a as A
ORDER BY A.IDrequestR
;QUIT;

PROC SQL;
CREATE TABLE Table_2 AS
SELECT DISTINCT A.*
, B.DID
FROM Work.Table_1b as A
LEFT JOIN Dev.Person_Demographics as B
ON A.Gender = B.Gender_De
AND A.DOB = B.PRBRTH_DT
AND A.name_last = B.INDVNM_LAST_NM
AND A.name_first = B.INDVNM_FRST_NM
;QUIT;

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

So the SAS error is correct, the date of birth variables have different types.

I don't see the current format of DOB though, which is required to accurately convert it, so what does it look like? ie 01Aug2022 or 2022/08/01...

 

This may work but it would be better to use the correct informat, ie date9, mmddyy than anydtdte which guesses. And can guess wrong. 

 

PROC SQL;
CREATE TABLE Table_2 AS
SELECT DISTINCT A.*
, B.DID
FROM Work.Table_1b as A
LEFT JOIN Dev.Person_Demographics as B
ON A.Gender = B.Gender_De
AND input(A.DOB, anydtdte.) = B.PRBRTH_DT
AND A.name_last = B.INDVNM_LAST_NM
AND A.name_first = B.INDVNM_FRST_NM
;QUIT;

 


@cbagdon-cox wrote:

Table 1a and Table 1c are using the same variables. The problem is with the join between table 1c variables and the dev.person_demographics table. I put the proc contents for those below.

 

Variables in Creation Order# Variable Type Len Format Informat Label1234567
RID Num 8 BEST.   RID
name_last Char 17 $17. $17. name_last
name_first Char 15 $15. $15. name_first
name_middle Char 15 $15. $15. name_middle
DOB Char 9      
Race Char 38      
Gender Char 7      

 

Variables in Creation Order

#

Variable

Type

Len

Format

Informat

Label

1

DID

Num

8

11.

11.

DID

9

GENDER_DE

Char

7

$7.

$7.

GENDER_DE

11

RACE_DE

Char

30

$30.

$30.

RACE_DE

13

PRBRTH_DT

Num

8

DATE9.

DATE9.

PRBRTH_DT

26

INDVNM_FRST_NM

Char

30

$30.

$30.

INDVNM_FRST_NM

27

INDVNM_MDL_NM

Char

20

$20.

$20.

INDVNM_MDL_NM

28

INDVNM_LAST_NM

Char

50

$50.

$50.

INDVNM_LAST_NM


 

View solution in original post

12 REPLIES 12
Reeza
Super User

Run a proc contents on table_1a and table_1b and look for the differences between the Gender and Date of Birth columns. I'm guessing one of those is character and one is numeric with a date format. Post that information here, with how the variable looks for each (ie 2022/08/11 or 11Aug2022) and we can help you fix the code.

 


@cbagdon-cox wrote:

The ERROR: Expression using equals (=) has components that are of different data types keep coming up in my log regardless of what variable is being used. Whatever is the last variable before the final quit statement shows that error. Tried multiple attempts at converting variables to character and numeric to try and match, removing variables from the join, reformatting the imported excel file, and rearranging order for the join to no avail. It's driving me crazy.

 

PROC IMPORT OUT=Table_1a
DATAFILE= "FilePath.XLSX"
DBMS=XLSX /*XLSX*/ REPLACE;
SHEET="Request File";
GETNAMES=YES ;
RUN;

 

PROC SQL;
CREATE TABLE Table_1b AS
SELECT DISTINCT A.IDrequestR as RID
, A.name_last
, A.name_first
, A.name_middle
, A.DOB
, upcase(A.race) as Race
, upcase(A.gender) as Gender
FROM work.Table_1a as A
ORDER BY A.IDrequestR
;QUIT;

PROC SQL;
CREATE TABLE Table_2 AS
SELECT DISTINCT A.*
, B.DID
FROM Work.Table_1b as A
LEFT JOIN Dev.Person_Demographics as B
ON A.Gender = B.Gender_De
AND A.DOB = B.PRBRTH_DT
AND A.name_last = B.INDVNM_LAST_NM
AND A.name_first = B.INDVNM_FRST_NM
;QUIT;


 

cbagdon-cox
Obsidian | Level 7

Table 1a and Table 1c are using the same variables. The problem is with the join between table 1c variables and the dev.person_demographics table. I put the proc contents for those below.

 

Variables in Creation Order# Variable Type Len Format Informat Label1234567
RIDNum8BEST. RID
name_lastChar17$17.$17.name_last
name_firstChar15$15.$15.name_first
name_middleChar15$15.$15.name_middle
DOBChar9   
RaceChar38   
GenderChar7   

 

Variables in Creation Order

#

Variable

Type

Len

Format

Informat

Label

1

DID

Num

8

11.

11.

DID

9

GENDER_DE

Char

7

$7.

$7.

GENDER_DE

11

RACE_DE

Char

30

$30.

$30.

RACE_DE

13

PRBRTH_DT

Num

8

DATE9.

DATE9.

PRBRTH_DT

26

INDVNM_FRST_NM

Char

30

$30.

$30.

INDVNM_FRST_NM

27

INDVNM_MDL_NM

Char

20

$20.

$20.

INDVNM_MDL_NM

28

INDVNM_LAST_NM

Char

50

$50.

$50.

INDVNM_LAST_NM

Reeza
Super User

So the SAS error is correct, the date of birth variables have different types.

I don't see the current format of DOB though, which is required to accurately convert it, so what does it look like? ie 01Aug2022 or 2022/08/01...

 

This may work but it would be better to use the correct informat, ie date9, mmddyy than anydtdte which guesses. And can guess wrong. 

 

PROC SQL;
CREATE TABLE Table_2 AS
SELECT DISTINCT A.*
, B.DID
FROM Work.Table_1b as A
LEFT JOIN Dev.Person_Demographics as B
ON A.Gender = B.Gender_De
AND input(A.DOB, anydtdte.) = B.PRBRTH_DT
AND A.name_last = B.INDVNM_LAST_NM
AND A.name_first = B.INDVNM_FRST_NM
;QUIT;

 


@cbagdon-cox wrote:

Table 1a and Table 1c are using the same variables. The problem is with the join between table 1c variables and the dev.person_demographics table. I put the proc contents for those below.

 

Variables in Creation Order# Variable Type Len Format Informat Label1234567
RID Num 8 BEST.   RID
name_last Char 17 $17. $17. name_last
name_first Char 15 $15. $15. name_first
name_middle Char 15 $15. $15. name_middle
DOB Char 9      
Race Char 38      
Gender Char 7      

 

Variables in Creation Order

#

Variable

Type

Len

Format

Informat

Label

1

DID

Num

8

11.

11.

DID

9

GENDER_DE

Char

7

$7.

$7.

GENDER_DE

11

RACE_DE

Char

30

$30.

$30.

RACE_DE

13

PRBRTH_DT

Num

8

DATE9.

DATE9.

PRBRTH_DT

26

INDVNM_FRST_NM

Char

30

$30.

$30.

INDVNM_FRST_NM

27

INDVNM_MDL_NM

Char

20

$20.

$20.

INDVNM_MDL_NM

28

INDVNM_LAST_NM

Char

50

$50.

$50.

INDVNM_LAST_NM


 

cbagdon-cox
Obsidian | Level 7

It worked! Why it wouldn't allow me to change the format of DOB in the table_1c code {used input(A.DOB,$date9.) as DOB} refused to work but running it in the join worked is beyond me. Thanks!

Reeza
Super User

You have to change it on the join as well as in the SELECT code. The informat would be date9, not $date9. as well. 

 

If your informat is date9, it's better to use that than to use anydtdte.

 

 

cbagdon-cox
Obsidian | Level 7

using date9. would through this error: ERROR: INPUT function requires a character argument.

Reeza
Super User

I'm guessing without the actual code, but the SAS log isn't usually wrong.

cbagdon-cox
Obsidian | Level 7
I'm still baffled as to why the log wasn't identifying DOB as the problem variable and instead was showing the error for whatever was the last variable before the quit statement, not very helpful.
ballardw
Super User

@cbagdon-cox wrote:
I'm still baffled as to why the log wasn't identifying DOB as the problem variable and instead was showing the error for whatever was the last variable before the quit statement, not very helpful.

One has a strong suspicion that the SAS developers sort of gave up on deciding where to attempt to place an error indicator since the places using an = might have multiple functions with multiple variables, sub-query results and such. So the error comes at the end of the whole statement , not 'the last variable'.

 

See this example where a text version of age was added to a copy of SASHELP.CLASS. The error comes neatly tucked in just after the semicolon that ended the create table selection and before the quit. If you had submitted multiple Create table or simple selections within a single Proc SQL call the error message does identify which specific query caused the problem

150  proc sql;
151     create table  sqlexample as
152     select a.*, b.agetext
153     , (a.age=b.agetext) as compared
154     from sashelp.class as a
155          left join
156          work.example as b
157          on a.name=b.name
158     ;
ERROR: Expression using equals (=) has components that are of different data types.
159  quit;
Tom
Super User Tom
Super User

So DOB is a character string and PRBRTH_DT is the number of days since 1960.

 

Look at the string in DOB and see if they follow the pattern that the DATE. informat can understand.  If they do then use the INPUT() function in your ON criteria.

input(dob,date9.) = prbirth_dt
cbagdon-cox
Obsidian | Level 7

This did not work and gave the same error.

ballardw
Super User

@cbagdon-cox wrote:

 

PROC IMPORT OUT=Table_1a
DATAFILE= "FilePath.XLSX"
DBMS=XLSX /*XLSX*/ REPLACE;
SHEET="Request File";
GETNAMES=YES ;
RUN;

Highlighted text is probably the most common cause of that particular error.

Proc Import uses external rules to "guess" as to the variable types and attributes because spreadsheet software does not impose any constraints. In a single column of a spread sheet you can have text (of multiple lengths), numeric values, dates and/or times. So SAS, which does have constraints on VARIABLES has to pick one type, numeric or character for all values in that COLUMN, and assign characteristics like formats. Because the rules are based on the contents of the columns if one sheet that is imported has character data at the top it will get treated as character, if the values only hold digits then the value becomes numeric. So two different spreadsheets that supposedly have similar values get treated differently by Proc Import based on the order of the values.

You don't even mention a guess or describe which variable(s) comparisons are involved. My guess is that the DOB and the PRBIRTH_dt are the most likely to be involved. Possibly because some , or all, of the dates involved in one of the sets had some values entered as text and the others used the underlying Excel numeric values with a display set to look similar to the text of the other.

 

The general advice is to ALWAYS check the result of Proc Import for variable types, especially ones that you intend to compare with others or to combine into a single data set.

 

The solution if you are going to be dealing with multiple files that have the supposed same structure is to not use Proc Import. Instead save the file to CSV and write a data step to read the text file and set the exact same properties for every file read.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 4837 views
  • 0 likes
  • 4 in conversation