- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi, I just started picking up SAS recently and am working on an assignment. I encountered a problem when trying to merge two data sets together using a variable that appears in both data sets, the error says that the variable has been defined as both character and numeric, but when I checked both data sets, the variable is of character type in both. Hence, I am not sure what exactly could the problem be. Could anyone kindly explain?
Thank you.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Welcome to the world of (almost) real data where folks tell you one thing about their data and you get to determine they weren't really sure of what they had.
Your choice is to decide which variable type you want to use going forward. If a variable is an identifier that you aren't going to do arithmetic with then likely the character version is what you want.
Here is a code template for fixing such issues with a single variable that needs to go from numeric to character:
data fixed; set have (rename=(account=accountnum)); length account $ 5.; account= put(accountnum,f5. -L); drop accountnum; run;
The rename in the data set option allows you to use the value of the old variable but create a new variable of the desired name. SAS will not let you change variable types directly. Many have tried and many have failed.
To minimize possibly confusion explicitly specify the length of the character variable before use.
The put function with the -L left justifies the string in the result, otherwise you are likely to get leading spaces.
Then drop the old value.
If you need to go from character to numeric then the function would be INPUT instead of Put and the Length statement is likely not needed. You would need an appropriate informat to match you value type.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Welcome to the SAS community 🙂 How have you checked the two data sets?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
This happens if a variable is of character type in one data set and numeric in another.
Please have a uniform data type or considering converting both to the same type.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Make sure the are identical - type, length, format .
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You have to check ALL of the common variables.
An easy what is to use PROC COMPARE. Use OBS=0 dataset options to prevent it from actually checking the data.
Let's setup up some example datasets with incompatible variables.
data test1 (rename=(name=var1)) test2 (rename=(age=var1));
set sashelp.class;
run;
Now let's use PROC COMPARE to compare the two datasets. In this case they are named TEST1 and TEST2.
proc compare data=test1(obs=0) compare=test2(obs=0);
run;
Looking at the output you can see this summary of the mis-matched variables.
Data Set Summary Dataset Created Modified NVar NObs WORK.TEST1 06JAN19:12:24:39 06JAN19:12:24:39 5 19 WORK.TEST2 06JAN19:12:24:39 06JAN19:12:24:39 5 19 Variables Summary Number of Variables in Common: 4. Number of Variables in WORK.TEST1 but not in WORK.TEST2: 1. Number of Variables in WORK.TEST2 but not in WORK.TEST1: 1. Number of Variables with Conflicting Types: 1. Listing of Common Variables with Conflicting Types Variable Dataset Type Length var1 WORK.TEST1 Char 8 WORK.TEST2 Num 8
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you, I have found out the problem but it's weird because it was detailed as a character type for both data sets in the data dictionary that was provided. In this case how should I do the conversion?
Variables Summary
Number of Variables in Common: 1.
Number of Variables in WORK.ACCOUNT but not in WORK.LOAN: 3.
Number of Variables in WORK.LOAN but not in WORK.ACCOUNT: 6.
Number of Variables with Conflicting Types: 1.
Listing of Common Variables with Conflicting Types
Variable Dataset Type Length Format Informat Label
account_id WORK.ACCOUNT Char 5 $CHAR5. $CHAR5.
WORK.LOAN Num 8 BEST. account_id
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
So the dataset LOAN was created incorrectly if the variable ACCOUNT_ID is supposed to be a character variable.
How did you create the dataset? Can you fix that step?
It MIGHT be possible to convert the number back to a character string, but perhaps not.
For example what if you have the number 11 what string should you convert it to? Did the original value have leading zeros? If so how many?
Also what if the original value for ACCOUNT_ID had a letter in it? There is no way that got converted to a number, so the value is now lost.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The data set wasn't created by me, it was provided, it was purposely created this way to be solved. I didn't realise that it was created wrongly previously.
This is the column in Account
This is the column in Loan
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Welcome to the world of (almost) real data where folks tell you one thing about their data and you get to determine they weren't really sure of what they had.
Your choice is to decide which variable type you want to use going forward. If a variable is an identifier that you aren't going to do arithmetic with then likely the character version is what you want.
Here is a code template for fixing such issues with a single variable that needs to go from numeric to character:
data fixed; set have (rename=(account=accountnum)); length account $ 5.; account= put(accountnum,f5. -L); drop accountnum; run;
The rename in the data set option allows you to use the value of the old variable but create a new variable of the desired name. SAS will not let you change variable types directly. Many have tried and many have failed.
To minimize possibly confusion explicitly specify the length of the character variable before use.
The put function with the -L left justifies the string in the result, otherwise you are likely to get leading spaces.
Then drop the old value.
If you need to go from character to numeric then the function would be INPUT instead of Put and the Length statement is likely not needed. You would need an appropriate informat to match you value type.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
One additional bit about the Format used for the PUT function. If your character values have leading zero you may want to use a Z format instead of F. That would come into play generally with fixed length identifiers such as 00543 where all of the values are padded with leading zeroes to a given length. If you have intermittent leading 0 without an explicit rule regarding what values may require them then matching the ID variables may get to be a problem.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content