I have a large xls file with data on multiple sheets. The data is written in chunks with reoccurring column names:
id var1 var2
1 10 a
1 12 b
2 9 c
2 12 d
3 8 e
3 11 f
id var1 var2
4 10 t
4 9 u
4 8 v
5 13 w
5 11 x
6 9 y
6 18 u
6 16 s
I have to account for the fact that the data has multiple column headers, spaces between column headers, and occurs on multiple sheets. The data also isn't evenly spaced, so there can be a different number of rows between column headers.
Does anyone have any idea how to import this into SAS?
The easiest thing is to manually go through the Excel file and clean it up.
Yes, of course you could write SAS code to do the clean-up, but to my mind that is a last resort, for example if there are a huge number of rows or you are going to have to do this for repeated different Excel files.
The easiest thing is to manually go through the Excel file and clean it up.
Yes, of course you could write SAS code to do the clean-up, but to my mind that is a last resort, for example if there are a huge number of rows or you are going to have to do this for repeated different Excel files.
I would suggest reading all vars in each worksheet as text fields in a data step and grammatically clean up the records and convert fields to numeric where required.
Thanks, I will just try to edit the document in excel by sorting the column headers.
"Import" as in Proc or Wizards assume the data will have at most one row of column headers and that is the first row.
Period.
If any column heading is exactly the same then SAS will create versions like Total, Total1, Total2 for the like named columns. If the text of the names exceeds 32 characters and is exactly the same for the first 32 characters then the repeated columns will get a variable name like Var23, where 23 refers to the column number the data appears in.
When the data is not a single header row then you will not have any numeric or date valued fields, everything will almost certainly end up as character and the variable names likely not anywhere as desired.
So either clean the data or learn to read from XLSX files using the PC Files and a lot of likely to be ugly code to read things as needed. I have done this. For literally hundreds of files but they all had the exact same structure. Do yours?
And files with different structures that need to be normalized. There is an ancient acronym: GIGO.
Fix the G so you can do IO.
Have you checked for hidden rows or columns? That can be another headache.
Don't know why you bother to provide data steps when you are talking about Excel.
I would be very tempted to go to the source that creates the Excel file and see if they could provide the data in a more standard, i.e single header row per column with unique column headers/variable names and in text format such as CSV, with a data description document.
Then read.
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Select SAS Training centers are offering in-person courses. View upcoming courses for: