BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
zoomzoom
Obsidian | Level 7

Hello members,

 

Need some help on an issue I am facing.

 

I have to import 6 different sheets from 2 excel (xls) files. The structure of all the sheets are same. Now, in every sheet, the 3th row from top has a date value in MMM-YY(Jan-17) format in excel, and rest of the rows from 4 and below have numbers. I have to read last months data depending on which month I am executing my program, and I need the values from row 50 till the last. However, I am unable to make the date values as my column name in the program. There is still no way I change the format of the excel sheets, as those come from another team who need to maintain the structure. I have read some of the posts on this topic, nothing seems to be working for me. Posting the code I am using, and associated questions.

 

SAS Version: 9.4 on 64 bit windows

Excel: 97-2003 Worksheet (.xls)

 

Here is the code I'm using:

 

options validvarname=any;

proc import out= out_file

        datafile="C:\location\excel_file.xls"

        dbms=xls replace;

        sheet="Name of sheet";

        getnames=no;

        datarow=4;

        mixed=yes;

run;

 

Questions:

 

1. When I say 'getnames=no', instead of the date as my column name, I get A,B,C, D and so on as column names. How do I fix these?

2. When I read the data from 1st row, I get the date values in numbers in 3rd row. While I can format numbers to date in a column,

     how do I format numbers in a row?

3. To select values between rows 50 and last, I can create a counter and read till last. Any other way to get this done effectively, in

     the proc import itself?

 

Sample data from excel:

 

1  Master Card              A               B              C              D              E    

2  Calendar Days          <                          Blank Row                     >          

3   Months                    Jan-17     Feb-17     Mar-17    Apr-17      May-17

4  Loss                      <      ------       Numbers      -------                  >

5  Recovery               <     -------  More Numbers        ---                 >

6

7

 

Thank you.

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

You might want to tranpose the data to make it easier to deal with. Storing data (MONTH) in metadata (variable name) is generally a poor design.

 

The biggest problem that you have is that your column headers in Excel are NOT names, but they are actual DATE values. So when SAS reads them it pulls over the actual numeric value of the number of days that Excel thinks represents that date.  So 2017-01-17 becomes 42752 instead of the "Jan-17" that you see in Excel because of the format that is attached to the cell.  Now SAS will first convert the number to the string '42752' and then add an underscore at the beginning since SAS variable names cannot start with a digit.

 

You can read your XLSX file and post-process it to change the variable names if you want.

First step is to read in the data starting from the third row.  Use the RANGE= option instead of the SHEET= option so SAS knows where to find the column headers.

options validvarname=v7 ;
proc import out= want replace
  datafile="&path/&fname"
  dbms=xlsx 
;
 RANGE="Data_have$A3:0"
run;

image.png

Now use PROC TRANPOSE (or other methods) to get the list of variable names.

proc transpose data=want(obs=0) out=names ;
  var _all_;
run;

Then calculate the new names you want to use for these variables. I don't think that Months is a very good name for the first varaible, so I will change that also. PROC IMPORT also sets the value into the _LABEL_.  You can make new labels or remove the one's that it has.  Let's make the new variable name using the MONYY format. But let's also put the actual full date form the column header into the label so we can see it if we want.

data names ;
  column + 1;
  set names ;
  length newname $32 ;
  if _n_ = 1 then do; 
    newname='Type';
    _label_ = 'Data Type';
  end;
  else do;
    date=input(substr(_name_,2),32.)+'01JAN1900'd -2 ;
    format date yymmdd10. ;
    newname=put(date,monyy7.);
    _label_ = 'Data for Month containing ' || put(date,yymmdd10.);
  end;
run;

image.png

Now store the old=new pairs into macro variables and use them in PROC DATASETS to change the dataset's metadata. If you change the labels then change them before you rename the variables.

proc sql noprint ;
  select catx('=',_name_,newname)
       , catx('=',_name_,quote(trim(_label_)))
    into :rename separated by ' '
       , :label separated by ' '
    from names
  ;
quit;

proc datasets nolist lib=work ;
  modify want ;
  label &label ;
  rename &rename ;
run;
quit;

image.png

 

 

 

 

 

View solution in original post

4 REPLIES 4
Reeza
Super User

Variable names cannot be dates, so I suggest using a different naming structure and keeping the column names as is, with the colunm LABELS showing your dates.

 

XLS is about 15 years old now, so I'm surprised that's the format they need to stick with, only a decade plus out of date now. 

 

I think you're going to need to attach a sample workbook. Yes, I know you probably can't share your data, so your best bet is to mock up some fake workbook that looks similar to what you need and then show what you need from the file as well.

zoomzoom
Obsidian | Level 7

Hello Reeza,

Here is the file attached.

As stated earlier, I would like to have row 3, i.e. different months as variable names, so I would be able to select any specific month as per the requirement. With the code I had posted, I simply could not get dates as column names. 'Data_Have' and 'Data_want' tabs show the difference.

Tom
Super User Tom
Super User

You might want to tranpose the data to make it easier to deal with. Storing data (MONTH) in metadata (variable name) is generally a poor design.

 

The biggest problem that you have is that your column headers in Excel are NOT names, but they are actual DATE values. So when SAS reads them it pulls over the actual numeric value of the number of days that Excel thinks represents that date.  So 2017-01-17 becomes 42752 instead of the "Jan-17" that you see in Excel because of the format that is attached to the cell.  Now SAS will first convert the number to the string '42752' and then add an underscore at the beginning since SAS variable names cannot start with a digit.

 

You can read your XLSX file and post-process it to change the variable names if you want.

First step is to read in the data starting from the third row.  Use the RANGE= option instead of the SHEET= option so SAS knows where to find the column headers.

options validvarname=v7 ;
proc import out= want replace
  datafile="&path/&fname"
  dbms=xlsx 
;
 RANGE="Data_have$A3:0"
run;

image.png

Now use PROC TRANPOSE (or other methods) to get the list of variable names.

proc transpose data=want(obs=0) out=names ;
  var _all_;
run;

Then calculate the new names you want to use for these variables. I don't think that Months is a very good name for the first varaible, so I will change that also. PROC IMPORT also sets the value into the _LABEL_.  You can make new labels or remove the one's that it has.  Let's make the new variable name using the MONYY format. But let's also put the actual full date form the column header into the label so we can see it if we want.

data names ;
  column + 1;
  set names ;
  length newname $32 ;
  if _n_ = 1 then do; 
    newname='Type';
    _label_ = 'Data Type';
  end;
  else do;
    date=input(substr(_name_,2),32.)+'01JAN1900'd -2 ;
    format date yymmdd10. ;
    newname=put(date,monyy7.);
    _label_ = 'Data for Month containing ' || put(date,yymmdd10.);
  end;
run;

image.png

Now store the old=new pairs into macro variables and use them in PROC DATASETS to change the dataset's metadata. If you change the labels then change them before you rename the variables.

proc sql noprint ;
  select catx('=',_name_,newname)
       , catx('=',_name_,quote(trim(_label_)))
    into :rename separated by ' '
       , :label separated by ' '
    from names
  ;
quit;

proc datasets nolist lib=work ;
  modify want ;
  label &label ;
  rename &rename ;
run;
quit;

image.png

 

 

 

 

 

zoomzoom
Obsidian | Level 7

Sorry for replying late, your solution worked out nicely. Thank you, Tom.

 

Another lesson I learnt was not to use sheet with Range, I somehow overlooked that you didnot use sheet.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 9830 views
  • 0 likes
  • 3 in conversation