BookmarkSubscribeRSS Feed

We all know that SAS develops its software in separate teams, but it can be really annoying when it becomes apparent that several associated teams haven't planned together how a SAS procedure will work.

 

I'm going to take as an example PROC IMPORT, which is part of Base SAS, but is also included in SAS/ACCESS. When you run the following program all the variables created begin with VAR, i.e. VAR1, VAR2, VAR3, VAR4, VAR5, etc., and this would also be true for DBMS=TAB and DLM:

PROC IMPORT FILE = "test.csv" DBMS = CSV REPLACE;
  GETNAMES = NO;
RUN;

However, using similar PROC IMPORT code for DBMS=EXCEL, in SAS/ACCESS for PC Files, will create variables beginning with F, i.e. F1, F2, F3, F4, F5, etc.:

PROC IMPORT FILE = "test.xls" DBMS = EXCEL REPLACE;
  GETNAMES = NO;
RUN;

More shocking though is using PROC IMPORT code for DBMS=XLS or XLSX in UNIX or Windows, in SAS/ACCESS for PC Files, as this will create variables with no prefix at all, i.e. A, B, C, D, E, etc.:

PROC IMPORT FILE = "test.xls" DBMS = XLS REPLACE;
  GETNAMES = NO;
RUN;

This inconsistency even extends to using GETNAMES = YES too when there are multiple columns with the same label.

 

If you want to import a CSV file, instead of an Excel file, or indeed import an Excel file in UNIX, then the subsequent processing step will have to be updated to use the new variable names (annoying!). Why can't the procedure be consistent, or, at least, have a parameter, like PREFIX=, that allows users to choose the prefix?

5 Comments
ballardw
Super User

You don't show the EXCEL file you imported. Did you by any chance examime what you get if your second column contains TEXT values?

 

Since EXCEL files are subject to random (and fairly often undocumented) behaviors at Microsoft whim I don't expect consistency. For instance XLXS is ZIPPED XML but XLS is a proprietary binary format.

 

CSV and any other delimited file behaves the same.And if I have to import the same structured file then I use a data step so I control the variable names, labels, formats, informats. IMNSHO the best use for proc import is to generate base code to read delimited files so that I can edit it to behave correctly. Why trust any guesses to determine my data.

As a side effect of controlling my informats I don't get things like variable type mismatches or length variance issues when I combine data.

jklaverstijn
Rhodochrosite | Level 12

I feel your pain Phil. And I do support your ballot idea.

 

One thing all theses scenarios for automatic column naming have in common though is they all yield equally meaningless and useless results. You will have to deal with them in one way or another. 😉

 

- Jan.

hollandnumerics
Pyrite | Level 9

ballardw,

 

It actually doesn't matter what the contents of the Excel file look like, as they are given the same variable name whether character or numeric. The point is that I would like to have some control, and therefore some consistency, in the variable names generated by PROC IMPORT, so I don't get any unpleasant surprises.

 

Way back in the dim and distant past I was a FORTRAN programmer, and so I tend to use i, j and k for DO loop variables. While using DBMS=EXCEL or CSV I am fairly safe, but switching from DBMS=EXCEL to DBMS=XLSX could cause me considerable problems if my spreadsheet has more than 8 columns, and I have a Data Step containing a DO loop using i !!

 

...........Phil

ballardw
Super User

WATFOR, WATFIV, FORTRAN77 or other dialect? I also started with FORTRAN and that is one of the reasons I don't like Excel in just about any flavor. I programmed for something like 20 years before working with an actual spreadsheet. The fact that they do not have variables or type restrictions in columns means that spreadsheets are just about so much garbase. (Merged cells are the devil's playground)

 

Which is why I turn them into CSV, import with a datastep so I know what I actually have.

Peter_C
Rhodochrosite | Level 12
In a word of support for Base SAS - I believe that the code generated by the IMPORT procedure was established outside the dev team for Base SAS - probably it was a brainchild of the interfaces team responsible for upgrades to DMS. For example, 1 generating an automatic conversion of a constant from numeric to char (very much discouraged among those who dislike such NOTEs); and 2, overloading the INPUT statement with redundant $ symbols. I could go on....
The generated code as a template does not lead to better understanding of data steps (imo).
At least we have to support Phil's suggestion!🤓Perhaps a reviewer at SAS might consider not only how straight forwardly Phil's proposal could be implemented, but also, how good generated code should become a teaching aid for those who are still learning about the amazing functionality available in the languages of SAS.