BookmarkSubscribeRSS Feed
Elliott
Obsidian | Level 7

I need some help,  I have a .xlsx file that i need to import and append to a dataset each day.  

 

1st problem is the variable names have spaces instead of underscores so SAS gives me an error

 

ERROR:  The name XXX is not a valid SAS name.

 

The other issue is the data on the xlsx starts at A3 not A1 so I have to remove the top 2 lines before I import.  how can I avoid that and just start picking up the data at A3?  The report has over 100K records daily and 90+ variables.  So I cannot set a range because it can change daily how many records.

 

I am trying to build a process to automate this import and append to run daily in our schedule. 

 

Any assistance will be greatly appreciated.

 

Thank you,

Elliott

6 REPLIES 6
Reeza
Super User

Try the following option while importing the data:

option validvarname=V7;

For the first two rows, try specifying the DATAROW = option to have it start on the third line. 

 

 

options validvarname=v7;

proc import out=want datafile='path to file.xlsx' dbms=xlsx replace; datarow=3;run;

 


@Elliott wrote:

I need some help,  I have a .xlsx file that i need to import and append to a dataset each day.  

 

1st problem is the variable names have spaces instead of underscores so SAS gives me an error

 

ERROR:  The name XXX is not a valid SAS name.

 

The other issue is the data on the xlsx starts at A3 not A1 so I have to remove the top 2 lines before I import.  how can I avoid that and just start picking up the data at A3?  The report has over 100K records daily and 90+ variables.  So I cannot set a range because it can change daily how many records.

 

I am trying to build a process to automate this import and append to run daily in our schedule. 

 

Any assistance will be greatly appreciated.

 

Thank you,

Elliott


 

ballardw
Super User

Reliance on Excel for data interchange is problematic because there are no rules about some of these things.

I would suggest: (assuming these are all supposed to be of the same file structure and similar contents:

1) Save the file as CSV.

2) Use Proc Import or the import wizard to read that file. You can indicate which row as the first row of data values (firstobs=). If your variable names do not occur on line 1 in the data I would modify the file imported so that they are on the first line. Also set the GUESSING rows option to something large like 32000 so that SAS examines more rows to determine lengths of variables and types.

3) The step above will generate a data step with code appearing in the log. COPY that data step back into the Editor an modify as follows:

4) Examine your variable names. Any that you do not like do a search and replace in the editor to replace all occurrences. Examine the informat statements. Character variables will be indicated with something like $24. That says the length of the variable will be 24 characters long. You should know from your data description (you do have one don't you?) what the maximum expected length would be. Make sure that number matches that description. Or make a guess. I would replace 24 with something like 35 to allow for other longer values later.

5) Make sure that variables you expect to be numeric have a numeric informat, likely best8 best12 or best32 from proc import. If not change it. 

6) If you should have currency values expect to see a COMMA informat. If the currency has $ it is character and you need to change it. Make the width including decimals at least one or two characters larger than expected values: $123,456,789.12 may be set to comma15.2 I would increase that to comma17.2.

7) SAVE the program from editor.

Rerun it with the changes made.

Check the data. You may have a couple rounds of adjusting things.

When this is working as desired then make sure 1) each of your excel files is saved as CSV, 2) replace the name of the file in the INFILE statement , optionally the name of the data set output. The INFILE should also have the firstobs=3 set.

 

This method will not care how many rows there are but if the columns change order or number then you will have a lot of work in the future.

 

Your statement " The report has over 100K records daily.." may indicate other issues if this file is actually a report because reports often have summary lines/columns that may have different layouts.

 

Since I suspect that these daily files are not manually prepared you might go up the data chain and see if a CSV file could be created to begin with that has a documented layout.

 

 

AhmedAl_Attar
Rhodochrosite | Level 12

Hi @Elliott

 

Check this Blog Entry "Using LIBNAME XLSX to read and write Excel files" from @ChrisHemedinger

 

It shows how to treat your spreadsheet as a Library, and in turn you could use Proc Append to directly append your new data to the bottom of your existing data set.

 

*Note: It has an example of how to skip the first x rows, using the (firstobs=) Data Set option Smiley Wink

 

Hope this helps,

Ahmed

Reeza
Super User
Having done this before, I'll echo what others are saying, CSV is better. If you're interested in going down this route, SAS has a VBS script that will convert the files automatically so it can still be automated.
Elliott
Obsidian | Level 7
Thanks, I have requested the data be sent in a .csv. Hopefully that will take care of my issues.

Thanks!

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 2138 views
  • 0 likes
  • 5 in conversation