BookmarkSubscribeRSS Feed
K_McInvale
Calcite | Level 5
Note: I am using PC SAS 9.4 M2(x64).

I have some Excel spreadsheets to import and each spreadsheet contains two datasets that ate stacked horizontally.

Upstream
Record_No, Col2, Col3, etc
1 , data, data, data
2 , data, data, data
3 , data, data, data
n. , data, data, data
Blank-row
Downstream
Record_No, col2, col3, col4
1 , data, data, data
......
......

I'm thinking something like import the whole sheet into a dataset. Then a proc statement data = work.upstream. work.downstream. A do until loop until the blank row or merged cell 'downstream'. This data output to upstream and the remaining data to 'downstream' dataset.

Any suggestions would be greatly appreciated

8 REPLIES 8
DartRodrigo
Lapis Lazuli | Level 10

Hi mate,

 

This code works as you need:

 Use range option to specify the range cells and get the first table and the second, the following example i've made:

 

TABLE1        
COL1 COL2 COL3 COL4 COL5
A B C D E
A B C D E
A B C D E
A B C D E
A B C D E
A B C D E
A B C D E
         
TABLE2        
COL1 COL2 COL3 COL4 COL5
A B C D E
A B C D E
A B C D E
A B C D E
A B C D E
A B C D E
A B C D E

 

PROC IMPORT DATAFILE="/PATH/test.xls"
		OUT = UPSTREAM                       
    DBMS=XLS REPLACE;                                                                                                                 
    SHEET="Sheet1";                                                                                                                    
    GETNAMES=YES;                                                                                                                      
DATAROW=2;
RANGE="Sheet1$A1:E9";
RUN;

PROC IMPORT DATAFILE="/PATH/test.xls"
		OUT = DOWNSTREAM                       
    DBMS=XLS REPLACE;                                                                                                                 
    SHEET="Sheet1";                                                                                                                    
    GETNAMES=YES;                                                                                                                      
DATAROW=2;
RANGE="Sheet1$A11:E19";
RUN;

Hope this helps

 

 

K_McInvale
Calcite | Level 5

Sorry, I should have been more clear.  The first section, Table1, contains 'n' observations.  I am importing from roughly 30 workbooks and n is rarely the same.  So, using range= does not help me in this situation.

 

 

 

DartRodrigo
Lapis Lazuli | Level 10

Ok,

I changed to use macro to get the values and separate the datasets:

 

PROC IMPORT DATAFILE="/path/test.xls"
		OUT = FIRSTSTREAM                       
    DBMS=XLS REPLACE;                                                                                                                 
    SHEET="Sheet1";                                                                                                                    
    GETNAMES=YES;                                                                                                                      
DATAROW=3;
RUN;

DATA HAVE;
   SET FIRSTSTREAM;
   X = _N_;
RUN;

PROC SQL;
  SELECT X+2    INTO: FIRSTLINE FROM HAVE WHERE TABLE1 = "";
  SELECT MAX(X) INTO: LASTLINE  FROM HAVE;
QUIT;


PROC IMPORT DATAFILE="/path/test.xls"
		OUT = UPSTREAM                       
    DBMS=XLS REPLACE;                                                                                                                 
    SHEET="Sheet1";                                                                                                                    
    GETNAMES=YES;                                                                                                                      
DATAROW=2;
RANGE="Sheet1$A1:E&FIRSTLINE.";
RUN;

PROC IMPORT DATAFILE="/path/test.xls"
		OUT = DOWNSTREAM                       
    DBMS=XLS REPLACE;                                                                                                                 
    SHEET="Sheet1";                                                                                                                    
    GETNAMES=YES;                                                                                                                      
DATAROW=2;
RANGE="Sheet1$A&FIRSTLINE.:E&LASTLINE.";
RUN;

PGStats
Opal | Level 21

Could you post one of your Excel files? The proper approach depends a lot on data structure and layout.

PG
K_McInvale
Calcite | Level 5

Excellent point Mr Respected Advisor! I have a dummy workbook and I will place an image here as well.

 

dummy_image.png

 

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Which shows my example nicely.  What you have posted is a best case scenario, all the columns are text, and the same structure.  Is your data really like that, I wouldn't think so.  So what will you do if:

- col1 is missing for any reason within the data?

- numbers where previous table is characters

- different numbers of columns

These are a few things that spring to mind, not to mention hidded data, special characters, no fixed setup, so could change at further attempts etc.  Why are you importing this data in the first place, why not access the Raw data this was produced from?  I mentioned this in another topic, but one of the big reasons to avoid Excel as an output format is the general misconception that it is a data transfer format and can be used as such (unlike Word/PDF), hence we end up trying to read in output review files, which is what you are trying to do here, rather than go through a strcutured validated robust approach of:

- create data import specifcation

- get agreement on file format (using proper data transfer file formats = CSV, XML etc.)

- create import program based on specifcations

- run program on test data and validate

- run at each import occasion and validate

DartRodrigo
Lapis Lazuli | Level 10
PROC IMPORT DATAFILE="/path/test.xls"
		OUT = FIRSTSTREAM                       
    DBMS=XLS REPLACE;                                                                                                                 
    SHEET="Sheet1";                                                                                                                    
    GETNAMES=YES;                                                                                                                      
DATAROW=3;
RUN;

DATA HAVE;
   SET FIRSTSTREAM;
   X = _N_;
RUN;

PROC SQL;
  SELECT X+2    INTO: FIRSTLINE FROM HAVE WHERE TABLE1 = "";
  SELECT MAX(X) INTO: LASTLINE  FROM HAVE;
QUIT;


PROC IMPORT DATAFILE="/path/test.xls"
		OUT = UPSTREAM                       
    DBMS=XLS REPLACE;                                                                                                                 
    SHEET="Sheet1";                                                                                                                    
    GETNAMES=YES;                                                                                                                      
DATAROW=2;
RANGE="Sheet1$A1:E&FIRSTLINE.";
RUN;

PROC IMPORT DATAFILE="/path/test.xls"
		OUT = DOWNSTREAM                       
    DBMS=XLS REPLACE;                                                                                                                 
    SHEET="Sheet1";                                                                                                                    
    GETNAMES=YES;                                                                                                                      
DATAROW=2;
RANGE="Sheet1$A&FIRSTLINE.:E&LASTLINE.";
RUN;

This code do exactly what you need, and you don't need to specify which one is the line to breakup the tables.
You only have to change the where clause in proc sql.

 

Hope this helps

RW9
Diamond | Level 26 RW9
Diamond | Level 26

I wouldn't import it as is.  The reason is that you will have text (downstrem) in a numeric field.  Its messy at best, and assuming a missing value is the break could be misleading.  Its another example of Excel being a problem when dealing with real data.  I would suggest you break the table yourself into two sheets, and preferentially save each sheet as CSV and write an import program for each.  This avoids the numerous issues when using Excel .

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 1790 views
  • 1 like
  • 4 in conversation