DATA Step, Macro, Functions and more

Reading Messed up date and column names

Reply
New Contributor
Posts: 3

Reading Messed up date and column names

Hi

Let's say I have a dataset as below. I want to read the data but I am stuck with column names separated by space "Date Of Birth"

I can easily replace space with "_" but what if I have a large number of variables?  or a bracket or + sign in the column name.

 

Also with the informat  I can read the date variable for 1 particular date type but what about others?

 

NameMarks( secd)Age+Date of Birth
A833228-May-90
B842513-Feb-1999
C302626-10-1990
D1006316-Mar-00
Super User
Posts: 19,822

Re: Reading Messed up date and column names

Is this a text file?

If so, please post a small sample. It's important to post the original file (or a very close replica) to see if there are special characters.

New Contributor
Posts: 3

Re: Reading Messed up date and column names

Sorry, its just a general question which came to me while studying. There are no such files.

Let's assume its a text or a CSV file. 

Super User
Posts: 19,822

Re: Reading Messed up date and column names

Then the best answer is send the file back and ask for a better formatted file.

 

90% of the time, even if the field is date of birth there is another delimiter, ie a comma which makes the file a CSV file

Otherwise the work around tends to be to read the data but skip the first line so you manually write your input step and bypass the first line entirely. If you absolutely need it automated, you can read the first line in a second step. 

There are posts on here that show how using this approach by Ksharp.

Super User
Posts: 11,343

Re: Reading Messed up date and column names

And if you have reason to believe that your date column has differing formats of the original content then behaps the ANYDTDTE informat will work. Please see the documentation.

Super User
Super User
Posts: 7,060

Re: Reading Messed up date and column names

When reading the file you normally skip the header line. So spaces in the variable names shouldn't matter.

data want ;
  infile 'myfile.dat' firstobs=2 truncover ;
  length name $20 Marks Age 8 DOBchar $12 ;
  input name -- DOBchar ;
run;

If you are using PROC IMPORT or some other tool that will try to guess at what your file contains then it will have algorithms for converting the invalid characters into valid variable names.

 

As to converting the gibberish in the DOB field you probably will need to read it as character and write your own algorithm to see if you can convert it to a real date.

Ask a Question
Discussion stats
  • 5 replies
  • 127 views
  • 0 likes
  • 4 in conversation