DATA Step, Macro, Functions and more

Import Excel spreadsheet which contains two separate datasets

Reply
New Contributor
Posts: 3

Import Excel spreadsheet which contains two separate datasets

Note: I am using PC SAS 9.4 M2(x64).

I have some Excel spreadsheets to import and each spreadsheet contains two datasets that ate stacked horizontally.

Upstream
Record_No, Col2, Col3, etc
1 , data, data, data
2 , data, data, data
3 , data, data, data
n. , data, data, data
Blank-row
Downstream
Record_No, col2, col3, col4
1 , data, data, data
......
......

I'm thinking something like import the whole sheet into a dataset. Then a proc statement data = work.upstream. work.downstream. A do until loop until the blank row or merged cell 'downstream'. This data output to upstream and the remaining data to 'downstream' dataset.

Any suggestions would be greatly appreciated

Regular Contributor
Posts: 212

Re: Import Excel spreadsheet which contains two separate datasets

[ Edited ]

Hi mate,

 

This code works as you need:

 Use range option to specify the range cells and get the first table and the second, the following example i've made:

 

TABLE1        
COL1 COL2 COL3 COL4 COL5
A B C D E
A B C D E
A B C D E
A B C D E
A B C D E
A B C D E
A B C D E
         
TABLE2        
COL1 COL2 COL3 COL4 COL5
A B C D E
A B C D E
A B C D E
A B C D E
A B C D E
A B C D E
A B C D E

 

PROC IMPORT DATAFILE="/PATH/test.xls"
		OUT = UPSTREAM                       
    DBMS=XLS REPLACE;                                                                                                                 
    SHEET="Sheet1";                                                                                                                    
    GETNAMES=YES;                                                                                                                      
DATAROW=2;
RANGE="Sheet1$A1:E9";
RUN;

PROC IMPORT DATAFILE="/PATH/test.xls"
		OUT = DOWNSTREAM                       
    DBMS=XLS REPLACE;                                                                                                                 
    SHEET="Sheet1";                                                                                                                    
    GETNAMES=YES;                                                                                                                      
DATAROW=2;
RANGE="Sheet1$A11:E19";
RUN;

Hope this helps

 

 

New Contributor
Posts: 3

Re: Import Excel spreadsheet which contains two separate datasets

Sorry, I should have been more clear.  The first section, Table1, contains 'n' observations.  I am importing from roughly 30 workbooks and n is rarely the same.  So, using range= does not help me in this situation.

 

 

 

Regular Contributor
Posts: 212

Re: Import Excel spreadsheet which contains two separate datasets

Ok,

I changed to use macro to get the values and separate the datasets:

 

PROC IMPORT DATAFILE="/path/test.xls"
		OUT = FIRSTSTREAM                       
    DBMS=XLS REPLACE;                                                                                                                 
    SHEET="Sheet1";                                                                                                                    
    GETNAMES=YES;                                                                                                                      
DATAROW=3;
RUN;

DATA HAVE;
   SET FIRSTSTREAM;
   X = _N_;
RUN;

PROC SQL;
  SELECT X+2    INTO: FIRSTLINE FROM HAVE WHERE TABLE1 = "";
  SELECT MAX(X) INTO: LASTLINE  FROM HAVE;
QUIT;


PROC IMPORT DATAFILE="/path/test.xls"
		OUT = UPSTREAM                       
    DBMS=XLS REPLACE;                                                                                                                 
    SHEET="Sheet1";                                                                                                                    
    GETNAMES=YES;                                                                                                                      
DATAROW=2;
RANGE="Sheet1$A1:E&FIRSTLINE.";
RUN;

PROC IMPORT DATAFILE="/path/test.xls"
		OUT = DOWNSTREAM                       
    DBMS=XLS REPLACE;                                                                                                                 
    SHEET="Sheet1";                                                                                                                    
    GETNAMES=YES;                                                                                                                      
DATAROW=2;
RANGE="Sheet1$A&FIRSTLINE.:E&LASTLINE.";
RUN;

Respected Advisor
Posts: 4,649

Re: Import Excel spreadsheet which contains two separate datasets

Could you post one of your Excel files? The proper approach depends a lot on data structure and layout.

PG
New Contributor
Posts: 3

Re: Import Excel spreadsheet which contains two separate datasets

Excellent point Mr Respected Advisor! I have a dummy workbook and I will place an image here as well.

 

dummy_image.png

 

Super User
Super User
Posts: 7,401

Re: Import Excel spreadsheet which contains two separate datasets

Which shows my example nicely.  What you have posted is a best case scenario, all the columns are text, and the same structure.  Is your data really like that, I wouldn't think so.  So what will you do if:

- col1 is missing for any reason within the data?

- numbers where previous table is characters

- different numbers of columns

These are a few things that spring to mind, not to mention hidded data, special characters, no fixed setup, so could change at further attempts etc.  Why are you importing this data in the first place, why not access the Raw data this was produced from?  I mentioned this in another topic, but one of the big reasons to avoid Excel as an output format is the general misconception that it is a data transfer format and can be used as such (unlike Word/PDF), hence we end up trying to read in output review files, which is what you are trying to do here, rather than go through a strcutured validated robust approach of:

- create data import specifcation

- get agreement on file format (using proper data transfer file formats = CSV, XML etc.)

- create import program based on specifcations

- run program on test data and validate

- run at each import occasion and validate

Regular Contributor
Posts: 212

Re: Import Excel spreadsheet which contains two separate datasets

PROC IMPORT DATAFILE="/path/test.xls"
		OUT = FIRSTSTREAM                       
    DBMS=XLS REPLACE;                                                                                                                 
    SHEET="Sheet1";                                                                                                                    
    GETNAMES=YES;                                                                                                                      
DATAROW=3;
RUN;

DATA HAVE;
   SET FIRSTSTREAM;
   X = _N_;
RUN;

PROC SQL;
  SELECT X+2    INTO: FIRSTLINE FROM HAVE WHERE TABLE1 = "";
  SELECT MAX(X) INTO: LASTLINE  FROM HAVE;
QUIT;


PROC IMPORT DATAFILE="/path/test.xls"
		OUT = UPSTREAM                       
    DBMS=XLS REPLACE;                                                                                                                 
    SHEET="Sheet1";                                                                                                                    
    GETNAMES=YES;                                                                                                                      
DATAROW=2;
RANGE="Sheet1$A1:E&FIRSTLINE.";
RUN;

PROC IMPORT DATAFILE="/path/test.xls"
		OUT = DOWNSTREAM                       
    DBMS=XLS REPLACE;                                                                                                                 
    SHEET="Sheet1";                                                                                                                    
    GETNAMES=YES;                                                                                                                      
DATAROW=2;
RANGE="Sheet1$A&FIRSTLINE.:E&LASTLINE.";
RUN;

This code do exactly what you need, and you don't need to specify which one is the line to breakup the tables.
You only have to change the where clause in proc sql.

 

Hope this helps

Super User
Super User
Posts: 7,401

Re: Import Excel spreadsheet which contains two separate datasets

I wouldn't import it as is.  The reason is that you will have text (downstrem) in a numeric field.  Its messy at best, and assuming a missing value is the break could be misleading.  Its another example of Excel being a problem when dealing with real data.  I would suggest you break the table yourself into two sheets, and preferentially save each sheet as CSV and write an import program for each.  This avoids the numerous issues when using Excel .

Ask a Question
Discussion stats
  • 8 replies
  • 414 views
  • 1 like
  • 4 in conversation