BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
EarlyCode
Fluorite | Level 6

So I'm working from a source file that has a horizontal Table. (Random fictional data below)

 

Customer           00234     00236

Sales                       23           45

Service Hours     01:15      02:30

Billable              $28.75  $112.50

 

If I just imported as stored, SAS auto decides how to format, and since all the numbers different types data was lost. (Particularily since SAS and Excel stores time formats in different ways) Also, actual report is more involved than this simplified version, but that's off topic.

 

So my original proc import solution was to import with this script, which brought everything in as text, that I was then able to Proc Transpose with the first column as the heading labels, and then convert formats as needed.

PROC IMPORT DATAFILE= "Customer_Tracker.xls" OUT= WORK.TEMP_REPORT

DBMS=EXCEL REPLACE;

SHEET="SHEET1 $";

RANGE="A8:C11";

GETNAMES=No;

MIXED=YES;

SCANTEXT=NO;

USEDATE=NO;

SCANTIME=NO;

RUN;

 

However, after getting it to run locally, they wanted it to be automated on a UNIX environment. DBMS=EXCEL doesn't work in UNIX.

 

So the UNIX friendly proc import is

PROC IMPORT DATAFILE= "Customer_Tracker.xlsx" OUT= WORK.TEMP_REPORT

DBMS=XLSX REPLACE;

RANGE="SHEET1 $A8:C11";

GETNAMES=No;

RUN;

 

The Mixed, Scantext, Usedates, and Scantime our out of syntax for DBMS=XLSX. So I'm back to square one where SAS is auto formating the columns into values that are no longer useable for the end product.

 

So the question remains, how can I get SAS to import using DBMS=XLSX and force it to just read everything in as text so I can control how I want columns formatted after running the proc transpose?

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

When I create a sheet will a single column of variable names and multiple columns of numbers where each row is in a different format, like in your example, SAS happily reads them into numbers.  It seems to like to pick the $DOLLAR format to permanently attach to the variables (columns in the original file) even though only one of the rows used that.  Not sure why it picked that FORMAT to attach,  It is not the one associate with either the first row or the last row.

proc import file='test/book2.xsls' dbms=xlsx out=book2 replace;
  getnames=no;
run;

But the FORMAT attached to a variable doesn't change the value stored in the variable.  So just remove the formats.  

proc transpose data=book2 out=middle (drop=_name_ _label_);
  id A ;
  var b--c;
  format _all_;
run;
proc print;
run;

Now for the DATES and TIMES you will need to do some coding to convert them from Excel's internal numeric representation to SAS's internal representation, but that is not hard.

 

data want ;
 set middle ;
 date = date + '01Jan1900'd - 2;
 Service_hours = 24*60*60*Service_hours;
 format date date9. Service_hours time5. ;
run;
proc print;
run;

 

View solution in original post

20 REPLIES 20
ballardw
Super User

If the data is TAB delimited do not worry about Excel. Import the data using a data step and an infile statement that uses a tab delimiter and points to the raw tab delimited file..

You can get basic code by running the import wizard and tell it to use a tab delimiter. The code generated will appear in the log and can be copied and pasted into the editor and modified to use correct variable types, informats, assign lengths, labels and formats.

 

Then the program can be updated to point to different input files and output SAS datasets.

 

You can search for issues related to CSV files on the forum as they behave basically the same just a different delimiter.

 

And Excel won't mess with your values.

EarlyCode
Fluorite | Level 6

No, it's not in a TAB delimited file. It's in an excel file. There are no CSV or TXT files involved.

The import wizard uses DBMS=EXCEL which does not work in a unix environment.

 

Also, it's not excel messing with the values, it's SAS since SAS requires all values in a column to be the same format. And assigns based on the first value. Which it might interpret the first column as requiring a different format than the second. Which then when proc transposed to switch it to vertical table, can again affect the variable formats a second time. This causes multiple issues with dropping some values and other value getting transformed into meaninglessness.

 

All columns need to be forced to be read as text until after the table is transposed so correct formats can be applied.

 

I need to use DBMS=XSLX and I need SAS to apply text string formats to all columns so the raw data can be converted into useable data tables.

ballardw
Super User

Sorry, I misread your heading.

Save the Excel file as CSV or Tab delimited and import that.

Use a data step and infile and read properly from the begining. You may be much happier in the long run.

 

EarlyCode
Fluorite | Level 6
Adding a manual step of opening an excel file to re-save it in a different file format would run counter intuitive of the need to automate the task.
ballardw
Super User

Experience has taught many of us that use of Excel in a data transfer process is contraindicated for automating anything. The varying lengths of data in columns, content type mixed numeric and character, odd extra characters, blank rows treated as containing data and the ever popular entering values manually into Excel and having the type change from text to date, formula or who know what is a constant challenge.

 

If the data is exported from some other application you may find that CSV is an option that will work better in the long run.

 

An added "bonus" of using CSV or other text format is that with very little additional coding a single SAS datastep will read all of the files with like structure at one pass into a single set. Which reduces the steps of combining sets later. CSV will also handle much larger sets than Excel native files.

 

Also Excel file structures change and a process that works today could be broken when Excel 2016 or whatever the next version is released.

EarlyCode
Fluorite | Level 6
I agree with all that. Unfortunately I have no control over the source data, and must use it as stored.

Which is why the only solution I was able to work with DBMS=EXCEL is to set all the options to no. That way it doesn't try to predict the format and just brings it in as text.

And if I could just set it to run manually on my machine with Windows scheduler, it would be done.

But they upped priority. Which means running it on the UNIX server. So I have to use DBMS=XLSX, which doesn't have the same options.

I cannot manipulate the file or data until it is pulled into the SAS program. And if I let SAS decide how to format it, some values become 0, others become null.

For all the reasons that you are telling me and I'm fully aware of.

Isn't there any means of telling DBMS=XLSX to pull everything as a text string? Or another means of importing a file in a UNIX environment that will pull it as a text string?
Reeza
Super User

The problem is storing information in a non structured format, almost any program would have issues importing data in this structure. 

 

Can you use the libname method with XLSX, this depends a bit on your version of SAS?

 

If so, you can try the DBSASTYPE method

 

https://books.google.ca/books?id=nYm5BwAAQBAJ&pg=PA64&lpg=PA64&dq=sas+dbsastype+excel+example&source...

EarlyCode
Fluorite | Level 6

Reeza, unfortunately that link doesn't work for me. It's giving a "This page doesn't exist or you've reached the view limit" message.

Kurt_Bremser
Super User

What the others said.

Using Excel files as transfer medium and such a data structure is stupidity squared, period.

 

Save to a text file (if automation is needed, use the scriptable OpenOffice/LibreOffice on UNIX or VBA on Windows) and read from that.

 

In a BI environment, the best you can do with Excel is removing it from the harddisks.

EarlyCode
Fluorite | Level 6

Yes Kurt, thank you for repeating the problem that I'm being forced to find a work around for.

If using excel as a transfer medium didn't have so many inherant problems, then I wouldn't be having this issue.

 

Unfortunately, there is nothing I can do about any of that. I don't have control. I have to work with what's provided.

 

Yes, the client sends data in excel.

Yes, the structured format is a horizontal table instead of the verticle table SAS looks for.

No, I can't alter the file.

No, I can't change the file type.

Yes, I'm fully aware of all the problems that this causes.

 

I'll even agree that it's "stupidity squared". But I can't fix stupid. I can only work around it.

 

So unless you have a SAS process that can somehow change an excel file into a txt file without requiring it to be manually opened and resaved. It's not a solution.

 

Can we please start focusing on finding a work around that will work in a unix environment?

Because constantly having the problem restated is very frustrating not helping.

Tom
Super User Tom
Super User

Are you getting an XLS file or an XLSX file? Those are different formats.

Did you try the XLSX libname method in addition to the PROC IMPORT method?

 

What actual problems are you having? I would expect that SAS would import your mixed columns as text?  That what it did when I tried to re-create your data. What does SAS do with your data? 

 

Is it possible that all of your columns (after the row lable column) contain numeric data?  So that SAS will think they should all become numbers.  Even if did do that what problems does that cause?  Do you lose any data?  or do you just need to add a step to convert the data types based on which row it was on in the original file.

 

Have you looked into utilities that you can use to convert the file to a CSV (or other delimited) file that you could read with SAS?

 

Also are the files you receive consistent? Is the first row always the ID and the third row always time?

 

Did you try reading the data row by row so that there is no mixed columns?

proc import ... out=Customer replace

   ...

  RANGE='A8:C8' 

run;

 

EarlyCode
Fluorite | Level 6
SAS is assigning formats based on what's in the first row.

The result is some data is changed to whole numerical numbers dropping decimals when they should be maintained. Or a set number is replaced with a 0. Other values are dropped entirely becoming null values. etc.

Even bringing in row by row then doing proc transpose and merging the data sets won't provide a solution if it converts column B to a numeric, but column AG to character, than column AG might get dropped for not matching type.

Also, once broken into individual rows, there's a risk order could change and data associations would be lost.

As for other utilities, there are programs available on the UNIX server that can be activated as a macro within the SAS code, but I don't have authorization to write new ones, and setting a project for that group to create something for this limited scope wouldn't get the priority to be developed. So I'm mostly limited to what can be accomplished within the SAS program itself.

The Libname link provided above isn't opening for me. I've pulled from other reference material to attempt that. However, my time got redirected for the afternoon, so I haven't been able to resolve if that method will work or not.
Tom
Super User Tom
Super User

When I create a sheet will a single column of variable names and multiple columns of numbers where each row is in a different format, like in your example, SAS happily reads them into numbers.  It seems to like to pick the $DOLLAR format to permanently attach to the variables (columns in the original file) even though only one of the rows used that.  Not sure why it picked that FORMAT to attach,  It is not the one associate with either the first row or the last row.

proc import file='test/book2.xsls' dbms=xlsx out=book2 replace;
  getnames=no;
run;

But the FORMAT attached to a variable doesn't change the value stored in the variable.  So just remove the formats.  

proc transpose data=book2 out=middle (drop=_name_ _label_);
  id A ;
  var b--c;
  format _all_;
run;
proc print;
run;

Now for the DATES and TIMES you will need to do some coding to convert them from Excel's internal numeric representation to SAS's internal representation, but that is not hard.

 

data want ;
 set middle ;
 date = date + '01Jan1900'd - 2;
 Service_hours = 24*60*60*Service_hours;
 format date date9. Service_hours time5. ;
run;
proc print;
run;

 

EarlyCode
Fluorite | Level 6

Tom, Which DBMS did you use for your proc import when you tried to recreate the results?

 

When I use DBMS=EXCEL I can force it to read in mixed columns as text.But that doesn't work on the UNIX server. But when DBMS=XLSX which does work on the UNIX server is used, the options to tell SAS it will be mixed data and not to scan for file formats aren't valid.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 20 replies
  • 6310 views
  • 2 likes
  • 5 in conversation