Hi all
I have a SAS code that runs and generates a data set for for FY20 that is then exported to excel. The next year I run the same code but this time its for FY21. I want to merge them without doing it manually so in the code, I import in the FY20 data and then do this merge:
proc sql;
   create table combined_years as
      select * from FY20
      union all
      select * from FY21;
quit;I get multiple errors that read "Column 26 from the first contributor of UNION ALL is not the same type as its counterpart from the second", Is that because FY20 was imported in as an excel file and the data type changed on a few columns? Is there a way to correct that without going through each and every column that is incompatible?
If you create these datasets in SAS, then you must make sure that the columns are created with the same attributes.
If you import them from some external source, you need to adapt the process to get consistent results. Save the data from Excel to text files (csv or tab-separated), so you can read the files with a data step.
Thanks Kurt. That sounds logical. But not sure I know what you mean when you say
@Kurt_Bremser wrote:If you create these datasets in SAS, then you must make sure that the columns are created with the same attributes.
If you import them from some external source, you need to adapt the process to get consistent results. Save the data from Excel to text files (csv or tab-separated), so you can read the files with a data step.
The DATA step (with INFILE and INPUT) is the way to read external data from text files.
Yes, I would suspect the export and then import process is changing your data types. Does it have to be done? Run the code to create the 20, run the code to create the 21, merge. No export/import in the middle. Or is it the case that for some reason you can't run the 20 code now and get the same result as when it was originally run? Or maybe do the merge in Excel if you are exporting both 20 and 21? Save the 20 as a SAS data file and not an Excel file or both? Lots of things to be done.
It seems like we may not have the whole picture of what you are actually trying to do.
To answer your direct question, no, if you have variables which are numeric in one file and character in the other I think you are stuck with going through and sorting it out.
Thanks HB. I can always do it manually. I have been for a while but thought I can get rid of a manual step. And you are right. I simplified it a bit because the previous years data actually includes more than just FY20. It actually has FY2016-FY2020. So I would normally just run FY2021 and add the results to the existing excel 2016-2020 data to get my combined running years total. (I add the new data to the previous years running total)
Thanks Fergieis
No I have not tried that but I will give it a shot and see what happens.
I tried that and although it failed, there were less errors. These were the ones I did get.
ERROR: Variable Age has been defined as both character and numeric.
ERROR: Variable BMT has been defined as both character and numeric.
ERROR: Variable F14 has been defined as both character and numeric.
ERROR: Variable F15 has been defined as both character and numeric.
@wheddingsjr wrote:
I tried that and although it failed, there were less errors. These were the ones I did get.
ERROR: Variable Age has been defined as both character and numeric. ERROR: Variable BMT has been defined as both character and numeric. ERROR: Variable F14 has been defined as both character and numeric. ERROR: Variable F15 has been defined as both character and numeric.
At least now you know which ones to fix.
Are F14 and F15 even actual variables? Or just artifacts of some random cell in columns F14 and F15 of the Excel sheet?
Note that a frequent cause of that when importing from Excel is that the column is empty in one of the spreadsheets so SAS defines it as LENGTH=$1 instead of 8 since it take takes only one byte to store a character variable and 8 tor store a floating point number. If that is case then just drop those empty columns from that dataset.
F14 and F15 are actually column headings and those columns are empty but we have to keep them because the tableau dashboard that we import the excel file into expects those columns, empty or not. I am when the dashboard was created they needed those 2 columns. I am guessing they meant something prior to my coming on board abd no one else knows why they were included in the dashboard at some point in time.
Add the variables back, but you probably don't need to since if they are empty in both dataset they wouldn't be of different types.
data want;
  set fy20 (drop=age F14)  fy21(drop=F15);
run;The trickier problem is when the lengths of character variables are wrong. You can fix that by defining the variables before the SET statement. Also PROC IMPORT has a very nasty habit of attaching $xx. formats to character variables. If you fix the length but leave a format with too small a width attached then the values will look truncated when printed.
data want;
  length id $20 age 8 F1 $10 F2-F15 8 ;
  set FY20 (drop=age F14) FY21(drop=F15);
  format _character_ ;
run;
Using the first example you suggested yielded the following errors:
ERROR: Variable BMT has been defined as both character and numeric.
ERROR: Variable F14 has been defined as both character and numeric.
The second example had only 1 error and its:
ERROR: Variable BMT has been defined as both character and numeric.Should there be a drop for BMT?
The code I posted is just a sketch or template for what you need to do.
You will need to do the research to understand which variables are having type conflicts. Then determine which dataset has a numeric and which has a character version. Then determine if any of the character versions are empty and so can be dropped.
If the conflicted variables are non-empty in both datasets then you need to do more investigation to see which way you should be doing a conversion.
You also might want to check the definitions of the variables that the Tableau dashboard is expecting so that you deliver a dataset that will not cause the dashboard to not work right.
You are still trying to cure symptoms instead of causes.
How do your datasets originate? If you get them from somewhere else in the form of Excel spreadsheets (which is the worst file format for data transfer), you must fix that process.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
