BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Caetreviop543
Obsidian | Level 7

I have a large xls file with data on multiple sheets. The data is written in chunks with reoccurring column names:

id var1 var2 
1  10    a
1  12    b
2   9    c
2  12    d
3   8    e
3  11    f

id var1 var2
4  10    t
4   9    u    
4   8    v
5  13    w
5  11    x
6  9     y
6  18    u
6  16    s

I have to account for the fact that the data has multiple column headers, spaces between column headers, and occurs on multiple sheets. The data also isn't evenly spaced, so there can be a different number of rows between column headers. 

 

Does anyone have any idea how to import this into SAS?

 

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

The easiest thing is to manually go through the Excel file and clean it up.

 

Yes, of course you could write SAS code to do the clean-up, but to my mind that is a last resort, for example if there are a huge number of rows or you are going to have to do this for repeated different Excel files.

--
Paige Miller

View solution in original post

4 REPLIES 4
PaigeMiller
Diamond | Level 26

The easiest thing is to manually go through the Excel file and clean it up.

 

Yes, of course you could write SAS code to do the clean-up, but to my mind that is a last resort, for example if there are a huge number of rows or you are going to have to do this for repeated different Excel files.

--
Paige Miller
ghosh
Barite | Level 11

I would suggest reading all vars in each worksheet as text fields in a data step and grammatically clean up the records and convert fields to numeric where required.

Caetreviop543
Obsidian | Level 7

Thanks, I will just try to edit the document in excel by sorting the column headers.

ballardw
Super User

"Import" as in Proc or Wizards assume the data will have at most one row of column headers and that is the first row.

Period.

If any column heading is exactly the same then SAS will create versions like Total, Total1, Total2 for the like named columns. If the text of the names exceeds 32 characters and is exactly the same for the first 32 characters then the repeated columns will get a variable name like Var23, where 23 refers to the column number the data appears in.

 

When the data is not a single header row then you will not have any numeric or date valued fields, everything will almost certainly end up as character and the variable names likely not anywhere as desired.

So either clean the data or learn to read from XLSX files using the PC Files and a lot of likely to be ugly code to read things as needed. I have done this. For literally hundreds of files but they all had the exact same structure. Do yours?

And files with different structures that need to be normalized. There is an ancient acronym: GIGO.

Fix the G so you can do IO.

 

Have you checked for hidden rows or columns? That can be another headache.

 

Don't know why you bother to provide data steps when you are talking about Excel.

 

I would be very tempted to go to the source that creates the Excel file and see if they could provide the data in a more standard, i.e single header row per column with unique column headers/variable names and in text format such as CSV, with a data description document.

Then read.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1489 views
  • 0 likes
  • 4 in conversation