BookmarkSubscribeRSS Feed
0 Likes

I'd like to be able to do these 2 things::

 

1. Instead of defaulting to date/time, default something that looks like a date to date. A lot of times, text files has a useless timestamp portion of an incoming field (00:00:00) that can be ignored.

 

2. Unless an incoming column has 'dollar' or 'amount' or 'amt' in the name, default it to string.  There are so many 'codes' and 'numbers' like zip code, social security number, employee ID and so on that just have numbers in them but nobody ever performs numeric functions on them, so import them as strings (especially important for leading zeroes).

 

These could be 'check boxes' on the import wizard screen or options for EG.

2 Comments
ballardw
Super User

@tomrvincent wrote:

I'd like to be able to do these 2 things::

 

1. Instead of defaulting to date/time, default something that looks like a date to date. A lot of times, text files has a useless timestamp portion of an incoming field (00:00:00) that can be ignored.

 


I deal with this by not letting a guessing procedure guess what I want with the data. I also have periodic discussions with data suppliers about letting some anonymous "default" in their software treat dates as times. If you rely on import without a data sharing agreement and documented contents for variable types, lengths and ranges of values you will eventually have problems with data sets that should have character values get read as numeric or vice versa causing a plethora of cascading problems for combining data or valid variable types for analysis.

 

Except for very trivial exercises my import processes will use proc import one time build the skeleton of a data step to read a file format and then I adjust as needed. The frequent datetimes that should be dates are handled by changing the informat and formats assigned. Then use a code node with that generated and approved code to handle the data.

 

Considering the data sources I have seen I would not trust column headings to supply information about whether a data column should be string or numeric. Maybe your organization is sufficiently organized that would work for you but others not so much. Not to mention the number of languages that would have to be considered. I have dealt with a large number of data points that are numeric but would not meet your "rule" for import type such a biologic measurements, soil temperatures and moisture, weather data, a plethora of counts such as actions taken, cigarettes smoked.

tomrvincent
Rhodochrosite | Level 12

Dates:  I deal with a lot of data coming in from Access (and other) text files, for example, where dates have a timestamp whether it needs it or not.  That's just one example.  A simple check box would makes things easier.  You wouldn't have to use it if it didn't apply to you.

 

Also with 'numeric' column headingsI find that I'd like to change the default from numeric to string.  I could always change any columns back to numeric.  It's just tedious when I have 50 fields and almost none of them will have any mathematical functions applied to them...but SAS thinks they are numbers.  Again, if it doesn't apply to you, you wouldn't have to use it.  

 

They're called options for a reason.