I have some oddly formatted data from an excel spread sheet I'm trying to read in. A sample of the excel file I'm trying to read in is below. So in particular there are two challenges. I would like to ignore the title and start reading in and get names from the top header: Date, A, B, C, D, E, F, Time/Date. This is an sample of the spread sheet:
TITLE | |||||||
Date | A | B | C | D | E | F | Time |
Date | |||||||
January | |||||||
1/20/2004 | Company A | 2:00 p.m. | Connecticut | NA | 4,000 | 400,00 | 1/26/04, 9:00 a.m. |
February | |||||||
2/12/2004 | Company B | 11:32 a.m. | San Francisco | NA | 1,000 | 340,000 | 3/01/04, 8:50 a.m. |
2/28/2004 | CompanyC | 1:00 pm | New York | NA | NA | NA | |
None |
The Date column is particularly tricky for me because it has January and February, etc followed by MM/DD/YYYY format beneath it. How can I just read in the MM/DD/YYYY, and ignore the January, February, March, etc or place it side by side while reading in such as January 1/20/2004, in separate columns, and then the same for February, etc. So When reading in I would like to see something like
Month | Date | A | B | C | D | E | F | Time/Date |
January | 1/20/2004 | Company A | 2:00 p.m. | Connecticut | NA | 4,000 | 400,00 | 1/26/04, 9:00 a.m. |
February | 2/12/2004 | Company B | 11:32 a.m. | San Francisco | NA | 1,000 | 340,000 | 3/01/04, 8:50 a.m. |
February | 2/28/2004 | CompanyC | 1:00 pm | New York | NA | NA | NA |
Any suggestions?
I'm using this code to begin reading the file in:
PROC IMPORT OUT= WORK.Out
DATAFILE= "C:\Users\Desktop\Data\Excel1.xls"
DBMS=EXCEL REPLACE;
RANGE="Sheet1$";
GETNAMES=YES;
MIXED=NO;
SCANTEXT=YES;
USEDATE=YES;
SCANTIME=YES;
RUN;
I don't think there's an easy way in any language
There are ways though..
Reading Excel files like this is like reading reports.
The best option if available is to try to get a more clean layout from your source (this is to done on a regular basis right?, otherwise fix the Excel manually first).
Is the month name in a separate cell/row?
Either way, import, and then do some data step programming to fix it in the way you like it.
Yeah, I don't mind fixing the excel manually except. I have a several files that I would have to do so, it seems being able to read in the native format with sas would be the best strategy. The Time and Date are in a separate cell/row.
Is the data the exact same in each file, ie same row/same column? If so you could do it "manually" once using DDE and then run for each file.
Still a pain though.
So I guess there's no easy ways to do this in sas?
I don't think there's an easy way in any language
There are ways though..
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.