I need some help, I have a .xlsx file that i need to import and append to a dataset each day.
1st problem is the variable names have spaces instead of underscores so SAS gives me an error
ERROR: The name XXX is not a valid SAS name.
The other issue is the data on the xlsx starts at A3 not A1 so I have to remove the top 2 lines before I import. how can I avoid that and just start picking up the data at A3? The report has over 100K records daily and 90+ variables. So I cannot set a range because it can change daily how many records.
I am trying to build a process to automate this import and append to run daily in our schedule.
Any assistance will be greatly appreciated.
Thank you,
Elliott
Try the following option while importing the data:
option validvarname=V7;
For the first two rows, try specifying the DATAROW = option to have it start on the third line.
options validvarname=v7;
proc import out=want datafile='path to file.xlsx' dbms=xlsx replace; datarow=3;run;
@Elliott wrote:
I need some help, I have a .xlsx file that i need to import and append to a dataset each day.
1st problem is the variable names have spaces instead of underscores so SAS gives me an error
ERROR: The name XXX is not a valid SAS name.
The other issue is the data on the xlsx starts at A3 not A1 so I have to remove the top 2 lines before I import. how can I avoid that and just start picking up the data at A3? The report has over 100K records daily and 90+ variables. So I cannot set a range because it can change daily how many records.
I am trying to build a process to automate this import and append to run daily in our schedule.
Any assistance will be greatly appreciated.
Thank you,
Elliott
Reliance on Excel for data interchange is problematic because there are no rules about some of these things.
I would suggest: (assuming these are all supposed to be of the same file structure and similar contents:
1) Save the file as CSV.
2) Use Proc Import or the import wizard to read that file. You can indicate which row as the first row of data values (firstobs=). If your variable names do not occur on line 1 in the data I would modify the file imported so that they are on the first line. Also set the GUESSING rows option to something large like 32000 so that SAS examines more rows to determine lengths of variables and types.
3) The step above will generate a data step with code appearing in the log. COPY that data step back into the Editor an modify as follows:
4) Examine your variable names. Any that you do not like do a search and replace in the editor to replace all occurrences. Examine the informat statements. Character variables will be indicated with something like $24. That says the length of the variable will be 24 characters long. You should know from your data description (you do have one don't you?) what the maximum expected length would be. Make sure that number matches that description. Or make a guess. I would replace 24 with something like 35 to allow for other longer values later.
5) Make sure that variables you expect to be numeric have a numeric informat, likely best8 best12 or best32 from proc import. If not change it.
6) If you should have currency values expect to see a COMMA informat. If the currency has $ it is character and you need to change it. Make the width including decimals at least one or two characters larger than expected values: $123,456,789.12 may be set to comma15.2 I would increase that to comma17.2.
7) SAVE the program from editor.
Rerun it with the changes made.
Check the data. You may have a couple rounds of adjusting things.
When this is working as desired then make sure 1) each of your excel files is saved as CSV, 2) replace the name of the file in the INFILE statement , optionally the name of the data set output. The INFILE should also have the firstobs=3 set.
This method will not care how many rows there are but if the columns change order or number then you will have a lot of work in the future.
Your statement " The report has over 100K records daily.." may indicate other issues if this file is actually a report because reports often have summary lines/columns that may have different layouts.
Since I suspect that these daily files are not manually prepared you might go up the data chain and see if a CSV file could be created to begin with that has a documented layout.
Hi @Elliott
Check this Blog Entry "Using LIBNAME XLSX to read and write Excel files" from @ChrisHemedinger
It shows how to treat your spreadsheet as a Library, and in turn you could use Proc Append to directly append your new data to the bottom of your existing data set.
*Note: It has an example of how to skip the first x rows, using the (firstobs=) Data Set option
Hope this helps,
Ahmed
Save to CSV and read that with a data step. This is the only way to get consistent results.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.