03-24-2017 02:43 PM
Let's say I have a dataset as below. I want to read the data but I am stuck with column names separated by space "Date Of Birth"
I can easily replace space with "_" but what if I have a large number of variables? or a bracket or + sign in the column name.
Also with the informat I can read the date variable for 1 particular date type but what about others?
|Name||Marks( secd)||Age+||Date of Birth|
03-24-2017 02:54 PM
Is this a text file?
If so, please post a small sample. It's important to post the original file (or a very close replica) to see if there are special characters.
03-24-2017 03:00 PM
Then the best answer is send the file back and ask for a better formatted file.
90% of the time, even if the field is date of birth there is another delimiter, ie a comma which makes the file a CSV file
Otherwise the work around tends to be to read the data but skip the first line so you manually write your input step and bypass the first line entirely. If you absolutely need it automated, you can read the first line in a second step.
There are posts on here that show how using this approach by Ksharp.
03-24-2017 03:30 PM
And if you have reason to believe that your date column has differing formats of the original content then behaps the ANYDTDTE informat will work. Please see the documentation.
03-24-2017 04:03 PM
When reading the file you normally skip the header line. So spaces in the variable names shouldn't matter.
data want ; infile 'myfile.dat' firstobs=2 truncover ; length name $20 Marks Age 8 DOBchar $12 ; input name -- DOBchar ; run;
If you are using PROC IMPORT or some other tool that will try to guess at what your file contains then it will have algorithms for converting the invalid characters into valid variable names.
As to converting the gibberish in the DOB field you probably will need to read it as character and write your own algorithm to see if you can convert it to a real date.