Okay so I used csv instead of xlsx and the merging worked. sheet1 still had $7 and sheet2 still had $6 for the PID.
Merging the two did not give me the same warning of lengths and the output table displayed the AA-1000 values just fine.
The only issue now is that everyone here uses Excel, so that is the standard, and the sheets/files are getting constantly updated multiple times daily.
If you are going to use PROC IMPORT then you will have to live with the fact that the lengths of character variables will depend on the actual subset of the data that exists in the one file that PROC IMPORT is currently reading.
Are these files supposed to always have the same columns in the same order every time you use them? If so the fix can be very simple. Just add code to fix the variables. Perhaps something like:
proc import 'todays_excel_file.xlsx' out=step1 replace;
run;
data actual_dataset;
attrib pid length=$7 format=$7;
attrib pid_ini length=$3 format=$3.;
....
set step1;
run;
If the files change a lot then you will need to make something smarter that actually checks the metadata about the files and make decisions about how to fix (or if it can be fixed without human help).
Note you got lucky. You won't see truncation if the first version of the variable that SAS sees is the longest. but just trying to put the longest one first in the MERGE or SET statement cannot work when there are multiple character variables with varying lengths. The longest version of VAR1 might be in DS1 while the longest version of VAR2 might be DS2. So either order will truncate one of the variables.
Oof, yeah the five sheets used have the column in a different place.
And yes, I did something similar where I changed the formats after each import to a new dataset for two sheets and merged them and it worked fine. I was just wondering if there was something else I could have done before to mitigate doing that for all five sheets.
I'll make a macro that does all of that for me, I was just wondering if there was another solution that I was overlooking.
Thank you.
@sassy_seb wrote:
Oof, yeah the five sheets used have the column in a different place.
That is not the problem SHEET1 can have PID in column 3 and SHEET2 can have PID in column 4 without any issue if those two sheets contain different things.
But if the "same" sheet had the variable as column A today and then on tomorrow's run it had moved to column B then that is a bigger problem. Because then just dumping the file to CSV and reading it with your current program will try to read the values into the wrong variables.
And yes, I did something similar where I changed the formats after each import to a new dataset for two sheets and merged them and it worked fine. I was just wondering if there was something else I could have done before to mitigate doing that for all five sheets.
I'll make a macro that does all of that for me, I was just wondering if there was another solution that I was overlooking.
If you have a command line tool that can convert an XLSX file into a CSV file (or one CSV file for each worksheet in the workbook) then your process could be:
- Get new XLSX file(s)
- Convert them into CSV file(s)
- Read the CSV files into datasets
No MACRO code would be needed.
Thank you.
@sassy_seb wrote:
Okay so I used csv instead of xlsx and the merging worked. sheet1 still had $7 and sheet2 still had $6 for the PID.
Merging the two did not give me the same warning of lengths and the output table displayed the AA-1000 values just fine.
The only issue now is that everyone here uses Excel, so that is the standard, and the sheets/files are getting constantly updated multiple times daily.
Did you try the option GUESSINGROWS=MAX; on proc import code for the CSV that generated the $6.?
If that didn't have an issue then the MERGE just happens to have the right order of the data sets. This time. You history and this whole topic pretty much means that the problem will occur again with different data. There is also a likelihood that a fix that works now will fail when your values get to be 8 characters long such as 'AA-10000'.
Note: when you use Proc Import with text file formats like CSV then SAS writes data step code to the log.
You can copy that data step code to the editor and clean it up and change it. Set the INFORMAT for the variable to $7. instead of the $6. that was likely generated.
Yes I used GUESSINGROWS=MAX, and yes we were also discussing what will happen once we get to longer values like AA-10000 like you mentioned.
@sassy_seb wrote:
Yes I used GUESSINGROWS=MAX, and yes we were also discussing what will happen once we get to longer values like AA-10000 like you mentioned.
Do NOT use PRIC IMPORT to read CSV files. It is fine if you are just exploring. But if you need re-read the same updated file(s) over and over you should just write your own data step(s) to read the file(s).
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.