DATA Step, Macro, Functions and more

How to import an excel file with selected rows as column name?

Accepted Solution Solved
Reply
Contributor
Posts: 39
Accepted Solution

How to import an excel file with selected rows as column name?

Hello members,

 

Need some help on an issue I am facing.

 

I have to import 6 different sheets from 2 excel (xls) files. The structure of all the sheets are same. Now, in every sheet, the 3th row from top has a date value in MMM-YY(Jan-17) format in excel, and rest of the rows from 4 and below have numbers. I have to read last months data depending on which month I am executing my program, and I need the values from row 50 till the last. However, I am unable to make the date values as my column name in the program. There is still no way I change the format of the excel sheets, as those come from another team who need to maintain the structure. I have read some of the posts on this topic, nothing seems to be working for me. Posting the code I am using, and associated questions.

 

SAS Version: 9.4 on 64 bit windows

Excel: 97-2003 Worksheet (.xls)

 

Here is the code I'm using:

 

options validvarname=any;

proc import out= out_file

        datafile="C:\location\excel_file.xls"

        dbms=xls replace;

        sheet="Name of sheet";

        getnames=no;

        datarow=4;

        mixed=yes;

run;

 

Questions:

 

1. When I say 'getnames=no', instead of the date as my column name, I get A,B,C, D and so on as column names. How do I fix these?

2. When I read the data from 1st row, I get the date values in numbers in 3rd row. While I can format numbers to date in a column,

     how do I format numbers in a row?

3. To select values between rows 50 and last, I can create a counter and read till last. Any other way to get this done effectively, in

     the proc import itself?

 

Sample data from excel:

 

1  Master Card              A               B              C              D              E    

2  Calendar Days          <                          Blank Row                     >          

3   Months                    Jan-17     Feb-17     Mar-17    Apr-17      May-17

4  Loss                      <      ------       Numbers      -------                  >

5  Recovery               <     -------  More Numbers        ---                 >

6

7

 

Thank you.


Accepted Solutions
Solution
‎09-06-2017 06:26 PM
Super User
Super User
Posts: 8,290

Re: How to import an excel file with selected rows as column name?

[ Edited ]

You might want to tranpose the data to make it easier to deal with. Storing data (MONTH) in metadata (variable name) is generally a poor design.

 

The biggest problem that you have is that your column headers in Excel are NOT names, but they are actual DATE values. So when SAS reads them it pulls over the actual numeric value of the number of days that Excel thinks represents that date.  So 2017-01-17 becomes 42752 instead of the "Jan-17" that you see in Excel because of the format that is attached to the cell.  Now SAS will first convert the number to the string '42752' and then add an underscore at the beginning since SAS variable names cannot start with a digit.

 

You can read your XLSX file and post-process it to change the variable names if you want.

First step is to read in the data starting from the third row.  Use the RANGE= option instead of the SHEET= option so SAS knows where to find the column headers.

options validvarname=v7 ;
proc import out= want replace
  datafile="&path/&fname"
  dbms=xlsx 
;
 RANGE="Data_have$A3:0"
run;

image.png

Now use PROC TRANPOSE (or other methods) to get the list of variable names.

proc transpose data=want(obs=0) out=names ;
  var _all_;
run;

Then calculate the new names you want to use for these variables. I don't think that Months is a very good name for the first varaible, so I will change that also. PROC IMPORT also sets the value into the _LABEL_.  You can make new labels or remove the one's that it has.  Let's make the new variable name using the MONYY format. But let's also put the actual full date form the column header into the label so we can see it if we want.

data names ;
  column + 1;
  set names ;
  length newname $32 ;
  if _n_ = 1 then do; 
    newname='Type';
    _label_ = 'Data Type';
  end;
  else do;
    date=input(substr(_name_,2),32.)+'01JAN1900'd -2 ;
    format date yymmdd10. ;
    newname=put(date,monyy7.);
    _label_ = 'Data for Month containing ' || put(date,yymmdd10.);
  end;
run;

image.png

Now store the old=new pairs into macro variables and use them in PROC DATASETS to change the dataset's metadata. If you change the labels then change them before you rename the variables.

proc sql noprint ;
  select catx('=',_name_,newname)
       , catx('=',_name_,quote(trim(_label_)))
    into :rename separated by ' '
       , :label separated by ' '
    from names
  ;
quit;

proc datasets nolist lib=work ;
  modify want ;
  label &label ;
  rename &rename ;
run;
quit;

image.png

 

 

 

 

 

View solution in original post


All Replies
Super User
Posts: 24,028

Re: How to import an excel file with selected rows as column name?

Variable names cannot be dates, so I suggest using a different naming structure and keeping the column names as is, with the colunm LABELS showing your dates.

 

XLS is about 15 years old now, so I'm surprised that's the format they need to stick with, only a decade plus out of date now. 

 

I think you're going to need to attach a sample workbook. Yes, I know you probably can't share your data, so your best bet is to mock up some fake workbook that looks similar to what you need and then show what you need from the file as well.

Contributor
Posts: 39

Re: How to import an excel file with selected rows as column name?

Hello Reeza,

Here is the file attached.

As stated earlier, I would like to have row 3, i.e. different months as variable names, so I would be able to select any specific month as per the requirement. With the code I had posted, I simply could not get dates as column names. 'Data_Have' and 'Data_want' tabs show the difference.

Solution
‎09-06-2017 06:26 PM
Super User
Super User
Posts: 8,290

Re: How to import an excel file with selected rows as column name?

[ Edited ]

You might want to tranpose the data to make it easier to deal with. Storing data (MONTH) in metadata (variable name) is generally a poor design.

 

The biggest problem that you have is that your column headers in Excel are NOT names, but they are actual DATE values. So when SAS reads them it pulls over the actual numeric value of the number of days that Excel thinks represents that date.  So 2017-01-17 becomes 42752 instead of the "Jan-17" that you see in Excel because of the format that is attached to the cell.  Now SAS will first convert the number to the string '42752' and then add an underscore at the beginning since SAS variable names cannot start with a digit.

 

You can read your XLSX file and post-process it to change the variable names if you want.

First step is to read in the data starting from the third row.  Use the RANGE= option instead of the SHEET= option so SAS knows where to find the column headers.

options validvarname=v7 ;
proc import out= want replace
  datafile="&path/&fname"
  dbms=xlsx 
;
 RANGE="Data_have$A3:0"
run;

image.png

Now use PROC TRANPOSE (or other methods) to get the list of variable names.

proc transpose data=want(obs=0) out=names ;
  var _all_;
run;

Then calculate the new names you want to use for these variables. I don't think that Months is a very good name for the first varaible, so I will change that also. PROC IMPORT also sets the value into the _LABEL_.  You can make new labels or remove the one's that it has.  Let's make the new variable name using the MONYY format. But let's also put the actual full date form the column header into the label so we can see it if we want.

data names ;
  column + 1;
  set names ;
  length newname $32 ;
  if _n_ = 1 then do; 
    newname='Type';
    _label_ = 'Data Type';
  end;
  else do;
    date=input(substr(_name_,2),32.)+'01JAN1900'd -2 ;
    format date yymmdd10. ;
    newname=put(date,monyy7.);
    _label_ = 'Data for Month containing ' || put(date,yymmdd10.);
  end;
run;

image.png

Now store the old=new pairs into macro variables and use them in PROC DATASETS to change the dataset's metadata. If you change the labels then change them before you rename the variables.

proc sql noprint ;
  select catx('=',_name_,newname)
       , catx('=',_name_,quote(trim(_label_)))
    into :rename separated by ' '
       , :label separated by ' '
    from names
  ;
quit;

proc datasets nolist lib=work ;
  modify want ;
  label &label ;
  rename &rename ;
run;
quit;

image.png

 

 

 

 

 

Contributor
Posts: 39

Re: How to import an excel file with selected rows as column name?

Sorry for replying late, your solution worked out nicely. Thank you, Tom.

 

Another lesson I learnt was not to use sheet with Range, I somehow overlooked that you didnot use sheet.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 1303 views
  • 0 likes
  • 3 in conversation