DATA Step, Macro, Functions and more

Proc Import Issues in Unix Environment

Accepted Solution Solved
Reply
Contributor
Posts: 30
Accepted Solution

Proc Import Issues in Unix Environment

So I'm working from a source file that has a horizontal Table. (Random fictional data below)

 

Customer           00234     00236

Sales                       23           45

Service Hours     01:15      02:30

Billable              $28.75  $112.50

 

If I just imported as stored, SAS auto decides how to format, and since all the numbers different types data was lost. (Particularily since SAS and Excel stores time formats in different ways) Also, actual report is more involved than this simplified version, but that's off topic.

 

So my original proc import solution was to import with this script, which brought everything in as text, that I was then able to Proc Transpose with the first column as the heading labels, and then convert formats as needed.

PROC IMPORT DATAFILE= "Customer_Tracker.xls" OUT= WORK.TEMP_REPORT

DBMS=EXCEL REPLACE;

SHEET="SHEET1 $";

RANGE="A8:C11";

GETNAMES=No;

MIXED=YES;

SCANTEXT=NO;

USEDATE=NO;

SCANTIME=NO;

RUN;

 

However, after getting it to run locally, they wanted it to be automated on a UNIX environment. DBMS=EXCEL doesn't work in UNIX.

 

So the UNIX friendly proc import is

PROC IMPORT DATAFILE= "Customer_Tracker.xlsx" OUT= WORK.TEMP_REPORT

DBMS=XLSX REPLACE;

RANGE="SHEET1 $A8:C11";

GETNAMES=No;

RUN;

 

The Mixed, Scantext, Usedates, and Scantime our out of syntax for DBMS=XLSX. So I'm back to square one where SAS is auto formating the columns into values that are no longer useable for the end product.

 

So the question remains, how can I get SAS to import using DBMS=XLSX and force it to just read everything in as text so I can control how I want columns formatted after running the proc transpose?

 


Accepted Solutions
Solution
‎06-01-2016 06:13 PM
Super User
Super User
Posts: 6,500

Re: Proc Import Issues in Unix Environment

[ Edited ]

When I create a sheet will a single column of variable names and multiple columns of numbers where each row is in a different format, like in your example, SAS happily reads them into numbers.  It seems to like to pick the $DOLLAR format to permanently attach to the variables (columns in the original file) even though only one of the rows used that.  Not sure why it picked that FORMAT to attach,  It is not the one associate with either the first row or the last row.

proc import file='test/book2.xsls' dbms=xlsx out=book2 replace;
  getnames=no;
run;

But the FORMAT attached to a variable doesn't change the value stored in the variable.  So just remove the formats.  

proc transpose data=book2 out=middle (drop=_name_ _label_);
  id A ;
  var b--c;
  format _all_;
run;
proc print;
run;

Now for the DATES and TIMES you will need to do some coding to convert them from Excel's internal numeric representation to SAS's internal representation, but that is not hard.

 

data want ;
 set middle ;
 date = date + '01Jan1900'd - 2;
 Service_hours = 24*60*60*Service_hours;
 format date date9. Service_hours time5. ;
run;
proc print;
run;

 

View solution in original post


All Replies
Super User
Posts: 10,500

Re: Proc Import Issues in Unix Environment

If the data is TAB delimited do not worry about Excel. Import the data using a data step and an infile statement that uses a tab delimiter and points to the raw tab delimited file..

You can get basic code by running the import wizard and tell it to use a tab delimiter. The code generated will appear in the log and can be copied and pasted into the editor and modified to use correct variable types, informats, assign lengths, labels and formats.

 

Then the program can be updated to point to different input files and output SAS datasets.

 

You can search for issues related to CSV files on the forum as they behave basically the same just a different delimiter.

 

And Excel won't mess with your values.

Contributor
Posts: 30

Re: Proc Import Issues in Unix Environment

[ Edited ]

No, it's not in a TAB delimited file. It's in an excel file. There are no CSV or TXT files involved.

The import wizard uses DBMS=EXCEL which does not work in a unix environment.

 

Also, it's not excel messing with the values, it's SAS since SAS requires all values in a column to be the same format. And assigns based on the first value. Which it might interpret the first column as requiring a different format than the second. Which then when proc transposed to switch it to vertical table, can again affect the variable formats a second time. This causes multiple issues with dropping some values and other value getting transformed into meaninglessness.

 

All columns need to be forced to be read as text until after the table is transposed so correct formats can be applied.

 

I need to use DBMS=XSLX and I need SAS to apply text string formats to all columns so the raw data can be converted into useable data tables.

Super User
Posts: 10,500

Re: Proc Import Issues in Unix Environment

Sorry, I misread your heading.

Save the Excel file as CSV or Tab delimited and import that.

Use a data step and infile and read properly from the begining. You may be much happier in the long run.

 

Contributor
Posts: 30

Re: Proc Import Issues in Unix Environment

Adding a manual step of opening an excel file to re-save it in a different file format would run counter intuitive of the need to automate the task.
Super User
Posts: 10,500

Re: Proc Import Issues in Unix Environment

Experience has taught many of us that use of Excel in a data transfer process is contraindicated for automating anything. The varying lengths of data in columns, content type mixed numeric and character, odd extra characters, blank rows treated as containing data and the ever popular entering values manually into Excel and having the type change from text to date, formula or who know what is a constant challenge.

 

If the data is exported from some other application you may find that CSV is an option that will work better in the long run.

 

An added "bonus" of using CSV or other text format is that with very little additional coding a single SAS datastep will read all of the files with like structure at one pass into a single set. Which reduces the steps of combining sets later. CSV will also handle much larger sets than Excel native files.

 

Also Excel file structures change and a process that works today could be broken when Excel 2016 or whatever the next version is released.

Contributor
Posts: 30

Re: Proc Import Issues in Unix Environment

I agree with all that. Unfortunately I have no control over the source data, and must use it as stored.

Which is why the only solution I was able to work with DBMS=EXCEL is to set all the options to no. That way it doesn't try to predict the format and just brings it in as text.

And if I could just set it to run manually on my machine with Windows scheduler, it would be done.

But they upped priority. Which means running it on the UNIX server. So I have to use DBMS=XLSX, which doesn't have the same options.

I cannot manipulate the file or data until it is pulled into the SAS program. And if I let SAS decide how to format it, some values become 0, others become null.

For all the reasons that you are telling me and I'm fully aware of.

Isn't there any means of telling DBMS=XLSX to pull everything as a text string? Or another means of importing a file in a UNIX environment that will pull it as a text string?
Super User
Posts: 17,829

Re: Proc Import Issues in Unix Environment

The problem is storing information in a non structured format, almost any program would have issues importing data in this structure. 

 

Can you use the libname method with XLSX, this depends a bit on your version of SAS?

 

If so, you can try the DBSASTYPE method

 

https://books.google.ca/books?id=nYm5BwAAQBAJ&pg=PA64&lpg=PA64&dq=sas+dbsastype+excel+example&source...

Contributor
Posts: 30

Re: Proc Import Issues in Unix Environment

Reeza, unfortunately that link doesn't work for me. It's giving a "This page doesn't exist or you've reached the view limit" message.

Super User
Posts: 6,938

Re: Proc Import Issues in Unix Environment

What the others said.

Using Excel files as transfer medium and such a data structure is stupidity squared, period.

 

Save to a text file (if automation is needed, use the scriptable OpenOffice/LibreOffice on UNIX or VBA on Windows) and read from that.

 

In a BI environment, the best you can do with Excel is removing it from the harddisks.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Contributor
Posts: 30

Re: Proc Import Issues in Unix Environment

Yes Kurt, thank you for repeating the problem that I'm being forced to find a work around for.

If using excel as a transfer medium didn't have so many inherant problems, then I wouldn't be having this issue.

 

Unfortunately, there is nothing I can do about any of that. I don't have control. I have to work with what's provided.

 

Yes, the client sends data in excel.

Yes, the structured format is a horizontal table instead of the verticle table SAS looks for.

No, I can't alter the file.

No, I can't change the file type.

Yes, I'm fully aware of all the problems that this causes.

 

I'll even agree that it's "stupidity squared". But I can't fix stupid. I can only work around it.

 

So unless you have a SAS process that can somehow change an excel file into a txt file without requiring it to be manually opened and resaved. It's not a solution.

 

Can we please start focusing on finding a work around that will work in a unix environment?

Because constantly having the problem restated is very frustrating not helping.

Super User
Super User
Posts: 6,500

Re: Proc Import Issues in Unix Environment

Are you getting an XLS file or an XLSX file? Those are different formats.

Did you try the XLSX libname method in addition to the PROC IMPORT method?

 

What actual problems are you having? I would expect that SAS would import your mixed columns as text?  That what it did when I tried to re-create your data. What does SAS do with your data? 

 

Is it possible that all of your columns (after the row lable column) contain numeric data?  So that SAS will think they should all become numbers.  Even if did do that what problems does that cause?  Do you lose any data?  or do you just need to add a step to convert the data types based on which row it was on in the original file.

 

Have you looked into utilities that you can use to convert the file to a CSV (or other delimited) file that you could read with SAS?

 

Also are the files you receive consistent? Is the first row always the ID and the third row always time?

 

Did you try reading the data row by row so that there is no mixed columns?

proc import ... out=Customer replace

   ...

  RANGE='A8:C8' 

run;

 

Contributor
Posts: 30

Re: Proc Import Issues in Unix Environment

SAS is assigning formats based on what's in the first row.

The result is some data is changed to whole numerical numbers dropping decimals when they should be maintained. Or a set number is replaced with a 0. Other values are dropped entirely becoming null values. etc.

Even bringing in row by row then doing proc transpose and merging the data sets won't provide a solution if it converts column B to a numeric, but column AG to character, than column AG might get dropped for not matching type.

Also, once broken into individual rows, there's a risk order could change and data associations would be lost.

As for other utilities, there are programs available on the UNIX server that can be activated as a macro within the SAS code, but I don't have authorization to write new ones, and setting a project for that group to create something for this limited scope wouldn't get the priority to be developed. So I'm mostly limited to what can be accomplished within the SAS program itself.

The Libname link provided above isn't opening for me. I've pulled from other reference material to attempt that. However, my time got redirected for the afternoon, so I haven't been able to resolve if that method will work or not.
Solution
‎06-01-2016 06:13 PM
Super User
Super User
Posts: 6,500

Re: Proc Import Issues in Unix Environment

[ Edited ]

When I create a sheet will a single column of variable names and multiple columns of numbers where each row is in a different format, like in your example, SAS happily reads them into numbers.  It seems to like to pick the $DOLLAR format to permanently attach to the variables (columns in the original file) even though only one of the rows used that.  Not sure why it picked that FORMAT to attach,  It is not the one associate with either the first row or the last row.

proc import file='test/book2.xsls' dbms=xlsx out=book2 replace;
  getnames=no;
run;

But the FORMAT attached to a variable doesn't change the value stored in the variable.  So just remove the formats.  

proc transpose data=book2 out=middle (drop=_name_ _label_);
  id A ;
  var b--c;
  format _all_;
run;
proc print;
run;

Now for the DATES and TIMES you will need to do some coding to convert them from Excel's internal numeric representation to SAS's internal representation, but that is not hard.

 

data want ;
 set middle ;
 date = date + '01Jan1900'd - 2;
 Service_hours = 24*60*60*Service_hours;
 format date date9. Service_hours time5. ;
run;
proc print;
run;

 

Contributor
Posts: 30

Re: Proc Import Issues in Unix Environment

Tom, Which DBMS did you use for your proc import when you tried to recreate the results?

 

When I use DBMS=EXCEL I can force it to read in mixed columns as text.But that doesn't work on the UNIX server. But when DBMS=XLSX which does work on the UNIX server is used, the options to tell SAS it will be mixed data and not to scan for file formats aren't valid.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 20 replies
  • 1284 views
  • 2 likes
  • 5 in conversation