Dear Everyone,
I am adding around 50 files together with the following code:
data Y9C; set bhcf0203-bhcf1712; run;
The bhcf files contains quarterly financials of bank holding companies. But when I run the above stated code, I get the following error for a long list of variables. Could you please solve the problem?
ERROR:Variable BHCK4146 has been defined as both character and numeric. (I just put one here, but I get hundreds of errors in the original file)
This is often the end result when you import data files from Excel, as the conversion to Excel is based upon certain types of guesswork, and a variable could wind up character from one Excel data set and the same variable could wind up numeric from another Excel file.
But anyway, the only way to solve the problem is to make each one of these variables numeric in ALL data sets; or character in ALL of these data sets.
I understand what you said. But I have 100s of variables in each files. Even sometimes some files have new variables. So, it will be time consuming to convert each variable from character to numeric individually. Is there any way I can convert all of them from character to numeric in a single SAS code?
There is no built in function in SAS to do this. You could write a macro that identifies all variables that have mismatches between numeric and character across your data sets and then bulk convert them. The decision of whether or not to convert a variable to all numeric or convert it to all character is probably also important and time consuming, unless you want to decide that every variable is converted to all character.
You would need to create a data dictionary which defines all your columns with all attributes and use this DD as metadata (stored in a SAS table).
You then could implement code which compares in each imported data set the variables in there with the definitions in the DD (merge over variable name) and then generate conversion statements based on differences found - or raise exceptions for new variables not yet in the data dictionary (=to amend the DD and rerun the process).
Once you've got tables with standardized variables you can combine them.
...and even above isn't a 100% solution as nothing automated can really tell you what format to use when you have to convert a numeric variable to a character variable.
I understand what you said. But I have 100s of variables in each files. Even sometimes some files have new variables.
This data validation work should have been done when the data was imported. You now have data you can't use. Where is the time saving?
1. The correct way to fix the problem is to recreate the proper tables once and for all, and to take steps to ensure data quality from now on.
Excel makes no effort to standardise its data and creates all sorts of issues, so you can't skip vetting and correcting the data if you import such data.
2. The incorrect interim way, as a one off and in the short term, is to use this macro I just wrote to help you:
%macro convert_to_num(table, varlist);
%local i var;
data NEW_&table.;
set &table.;
%do i =1 %to %sysfunc(countw(&varlist.));
%let var=%scan(&varlist.,&i.);
length __&var. 8;
__&var.=&var.;
drop &var.;
rename __&var.=&var.;
%end;
run;
%mend;
data T1;
X='1'; Y='2';
run;
%convert_to_num(T1, X Y);
@nazmul wrote:
I understand what you said. But I have 100s of variables in each files. Even sometimes some files have new variables. So, it will be time consuming to convert each variable from character to numeric individually. Is there any way I can convert all of them from character to numeric in a single SAS code?
Some of your "values" may not be recoverable if they should have been character but were guessed as numeric. For example account numbers that my have leading zero characters. Accounts of "0012345" and "00012345" would both have numeric values of 12345.
Also you have another problem that is going to revolve around variables of the same name with different lengths as character and will get a warning about possible truncation. That will occur because one file had the longest value of "company name" as 20 characters and another perhaps 28 characters.
And if all of those variables were supposed to be numeric then you might want to examine your source files to find out why any are character. One typical cause of "character" results from guessing is that the first 20 lines of the data are all missing. In which case the field ends up with a length of 1 character. Which likely means that the digits stored in the remaining rows are incomplete to recover the numeric value. Or if your source files have people entering data then characters were introduced that are not acceptable for any numeric value and you would likely have a hard time getting the correct numeric value. Also if entering data into Excel the program will sometimes turn entered values into dates. If you intend to do a formula : =5-3 and forget the = and just enter 5-3 then next thing you know you have a date of 3-May. And importing that could have all sorts of issues.
A better approach may be to re-import those files. If they are starting as spreadsheets then convert them to CSV and import those. Use the option GUESSINGROWS=MAX (or a very large number if using a wizard that won't allow the MAX option). Then more rows of data are examined before assigning variable types and lengths.
Check for character variables for any that you think should not before attempting to combine the sets.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.