BookmarkSubscribeRSS Feed
EvoluZion3
Obsidian | Level 7

Hi, I'm trying to use PC FILES (in a PROC IMPORT) to import an Excel spreadsheet which has a Date field and a DateTime field next to eachother. They are different data items.

Unfortunately it appears I can only import either one or the other successfully, resulting in either:

i) using USEDATE=YES I get a correct Date but a DateTime of 1960 plus a few hours.

ii) using USEDATE=NO I get a correct DateTime but a "****" Date, due to it being projected into the year 3 billion.

 

I have to use PC FILES as the spreadsheet is on a Windows file system but our SAS is installed on Unix, and I'm using SAS 9.4.

 

I realise of course that I can do USEDATE=NO to get all the data, and then use DATEPART() on just the Date fields to correct it, however my import process is intended to be a shared macro where other users pass in a field list, a format list and a destination dataset name and the import is done for them (this is to help with consistent code across teams), and so I'm trying to avoid the users having to DATEPART() some of their fields after the import has taken place.

 

If there's nothing native that I can do, then in my macro I may have to do something clever like parse through the field format variable provided by my users into the macro, identify which are just a DATE9, and then do the DATEPART() in an intermediate table, which sounds really heavy going.

 

Does anyone have any tips?

 

PROC IMPORT

dbms=excelcs

OUT=WORK.IMPORT

DATAFILE= 'MYEXCEL.XLSX' REPLACE;

server="my_pcfiles_server";

port=1234;

range='Sheet1$';

SERVERUSER="me";

SERVERPASS="foobar";

RUN;

 

19 REPLIES 19
Kurt_Bremser
Super User

Save your Excel data to a csv file, inspect that file, and use the proper informats in a data step to read the data into SAS.

 

Do NEVER use Excel files if you expect working, reliable data transfer. Never.

EvoluZion3
Obsidian | Level 7
Thanks Kurt. Unfortunately we have no choice over the file 😞
Kurt_Bremser
Super User

Of course you have a choice. Opening and saving to csv with Excel can be scripted.

The important thing is to prevent the guessing of proc import, and that is only possible with a text-based format.

EvoluZion3
Obsidian | Level 7
Hi Kurt, to elaborate on my problem more: the import macro I'm trying to write will be called by various processes which will be scheduled on the SAS Scheduler, and so no Windows/Office-based activity can take place, it all has to be SAS and Unix. Being a shared macro, different team members will be importing lots of different spreadsheets as part of their jobs and so we can't do anything specific to one spreadsheet. Just to add to the problem, our installation of PC Files can't import or export TXT/CSV files. I've no idea why, I blame our I.T. But I know where you're coming from.
RW9
Diamond | Level 26 RW9
Diamond | Level 26

"different team members will be importing lots of different spreadsheets" - I would suggest you need to do some sort of system diagnosis.  If you have lots of Excel files, lots of users, lots of processes, then you need to have a top down approach.  Start by having IT provide an accessible data store, preferably in a cross platform storage method (database, flat text files etc.).  Then you would approach from your end to interact with that data source, and the other users interact with the data source their end.  Attempting to handle this only at your end will fail.  There is a lot of "can't" in the information provide, to which the simple answer is, you either put a whole lot of effort yourself in to try to fix this - not just once, but each and every time, or you "can't" solve the problem because the problem lies at another level.

 

As for your exact question, no you will need to read it in, then post process it.  I will tell you now that every time you run it however it will fail.  Excel is a bad data format (no structure or control), and proc import is a guessing procedure.  

EvoluZion3
Obsidian | Level 7
Hi RW9, thanks for your reply. I agree with everything you say.

Unfortunately we're a very large company split into different areas who all deal with data, across different methodologies (SAS, Hadoop, Oracle, SQL Server, DB2, Office, IBM Mainframe, etc. etc.) and sometimes it's easier (read: far, far cheaper) to just get hold of an Excel feed from somewhere and ingest it into an intermediary data warehouse which we manage for our area, but unfortunately sometimes when we sniff out a data source we want we can't dictate the nature of the file. This has been manageable up until now when we've hit a couple of spreadsheets which have both Dates and DateTimes, and they've caused us to come unstuck when we've tried to ingest to SAS via PC Files.
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Mmm, maybe try a convertor or third party tool like Open Office:

https://unix.stackexchange.com/questions/23726/convert-a-xlsx-ms-excel-file-to-csv-on-command-line-w...

 

Could be an option.  Afraid the only other option I have for you is to uncompress the xlsx file (they are only ZIP files with a different extension), and then read the plain XML contained therein.  Does require knowledge of MS Open Document format to understand the XML (as they make it as complex as possible).

EvoluZion3
Obsidian | Level 7

Thanks, that link was quite interesting.

In my case I don't think it'd help as unfortunately I don't think I can get SAS Scheduler to kick that off, and our PC Files installation can't import CSV files, and I can't transfer a file from Windows to Unix via the SAS Scheduler either. Can you feel my pain yet?

I'll pursue modifying my macro to be clever and do the DATEPART() based on the variables passed in by users of my import macro. If I get it working I'll post the code here for your delight.

ballardw
Super User

@EvoluZion3 wrote:
Hi RW9, thanks for your reply. I agree with everything you say.

Unfortunately we're a very large company split into different areas who all deal with data, across different methodologies (SAS, Hadoop, Oracle, SQL Server, DB2, Office, IBM Mainframe, etc. etc.) and sometimes it's easier (read: far, far cheaper) to just get hold of an Excel feed from somewhere and ingest it into an intermediary data warehouse which we manage for our area, but unfortunately sometimes when we sniff out a data source we want we can't dictate the nature of the file. This has been manageable up until now when we've hit a couple of spreadsheets which have both Dates and DateTimes, and they've caused us to come unstuck when we've tried to ingest to SAS via PC Files.

SAS/Access has modules for accessing Oracle, SQL Server and DB2. If these Excel files are coming from those sources then 1) perhaps SAS/Access may be of interest or 2) each of those will write text files, just talk to whoever exports to Excel and work out an agreement to generate CSV, other delimited or fixed column files which would have such documentation as to variable order, type. lengths and other properties. Which makes writing data steps to read the file formats practically trivial, allows consistent use of proper SAS informat for dates, times or datetime values as needed, assigning variable labels so the data sets help document themselves.

 

Even Excel and Access will create csv files.

Kurt_Bremser
Super User

@EvoluZion3 wrote:
... it all has to be SAS and Unix.

Then there's no place for Excel at all. Excel is a tool for people, it does not lend itself well for automation. On top of that, all tools that SAS provides for Excel interaction rely on guessing, and are therefore not suitable for consistent, repeatable results AT ALL.

 

Forget Excel, unless you are being paid by your employer for wasting time (and money) and causing problems.

ErikLund_Jensen
Rhodochrosite | Level 12

Hi @EvoluZion3 

If you have a SAS PC Files server on windows, I think you must have a license to SAS Access to PC Files on your Linux platform as well. We are pestered by the same problem, and we have solved it in the following way:

 

1. Mount the windows source folder containing the xlsx files on Linux. In my example using this entry in Linux fstab file:

//sas-adm-p.odknet.dk/DWHadmin /mnt/dwhadmin_prod cifs  sec=krb5,multiuser,file_mode=0770,dir_mode=0770,nounix,vers=3.0,noserverino

 

2. Allocate the spreadsheet using the xlsx engine:

libname wbook xlsx "/mnt/dwhadmin_prod/Work/pensionmangler.xlsx";

 

3. This makes the excel workbook available to your SAS program as a library with the individual sheets as members:

data ind; set wbook.sheetname; run;

 

Depending on your sheet names, you might have to supply the name as a SAS literal, like: set wbook.'$name'n;

 

I don't know if your special problem with date/datetime will disappear, but in my experience the xlsx libname is generally more reliable, when it comes to interpreting excel content.

EvoluZion3
Obsidian | Level 7
Hi ErikLund, thanks for your reply. I've tried that (fortunately we are able to LIBNAME from our Unix SAS installation into a Windows Excel file already) and unfortunately it hasn't worked - it classed all Date and DateTime fields as a Date, and therefore truncated the Time from the DateTime.
ErikLund_Jensen
Rhodochrosite | Level 12

Hi 

 

I forgot to mention that you will probably need assistance from your server staff to get a windows folder mounted on Linux in a way that makes it persistent and visible for other users.

 

And if they do that, they might be persuaded to install a command line file conversion tool like xlsx2csv too. Then you could run the conversion from your SAS program and get a CSV file as input to a data step. 

 

I think the best approach depends on the work flow. If we are talking ad-hoc tasks involving different spreadsheets, the xlsx engine is the easy solution. But reading CSV is much better in a production flow, where spreadsheets are used for regular data transfers, because it gives you full control over column definitions and names.

 

As RW9 and ballardw and other esteemed advisers points out every time this topic comes up, Excel and SAS is a bad combination. Everything else is better, flat files, csv, ftp.. And twisting arms on external partners could be cheaper, if all the hours wasted on reading spreadsheets are counted too.

 

Tom
Super User Tom
Super User

From the description it sounds like you have mixed date and datetime values IN THE SAME COLUMN of your Excel sheet.

Can you clarify with an example file that behaves in the way you say?

 

Are you asking how to distinguish between date (number of days) and datetime (number of seconds) values in a SAS variable?

If your range of possible dates is restricted enough then you might be able to tell just from the magnitude of the number.

data test;
  do year=1960 to 2020 by 10 ;
    date =mdy(1,1,year);
    datetime = dhms(date,0,0,0);
    output;
  end;
  format date datetime comma15. ;
run;
Obs    year               date           datetime

 1     1960                  0                  0
 2     1970              3,653        315,619,200
 3     1980              7,305        631,152,000
 4     1990             10,958        946,771,200
 5     2000             14,610      1,262,304,000
 6     2010             18,263      1,577,923,200
 7     2020             21,915      1,893,456,000

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 19 replies
  • 2362 views
  • 0 likes
  • 7 in conversation