Thank you so much for your very detailed summarization of your experiences in dealing with real-world datasets! I think your response deserves much more than a mere like, but unforunately, that is all I can offer. Still, now that you have invested plenty of your time on composing these words, I strongly suggest that you go a step further and transform them into a paper that may eventually appear in SAS User Group or academic journals like Journal of Statistical Software. Your experience is invaluable for both SAS users and beyond.
@ballardw wrote:
@Season wrote:
Does importing the file with the DATA step necessitates specification of the name and informat of each of every variable in the CSV? That would be a very formidable job as I have possibly thousands of columns in all.
Check on the assigned informat for your problem variables. If they were read as character but should be dates that is an indication that you may need to create new variables by parsing the values. Check on your national language settings (NLS) to see what order dates are read. OR if you see lots of invalid data messages involving those variables it is one indicator that the order may be different than your NLS and override to read as character and parse.
Still, I would like to further consult on NLS. I searched on the web and saw Microsoft having a webpage on NLS, but with a slightly different meaning- national language service. I am not sure if the two NLS's are the same, but anyway, could you please brief introduce what national language settings is and what impact does it have on importing datasets into statistical softwares like SAS? I used to think that the mere difference in the language of the interface and log of SAS does not really have an impact on its core capabilities such as loading and editing datasets.
@ballardw wrote:
Any data source that may have "thousands of columns" and doesn't provide documentation as to content of the file, such as expected lengths of character variables and layouts of date, time or datetime values needs to be considered with great suspicion. Without documentation how do you know what anything represents?
Finally, I would like to make a clarification. Your reminder is of great value and I thank you for it. However, the datasets I use does have a documentation. In fact, it is huge as it has a description for every variable therein. Still, only descriptions of variables instead of their intriacacies are documented. In other words, for a given column, I only know that it stands for a date with a particular meaning but do not know that it can take multiple formats like "12/16/15" and "12 16 15". Only when I imported it into SAS did I realize this issue.
... View more