DATA Step, Macro, Functions and more

unable to read messy raw data

Reply
Contributor
Posts: 20

unable to read messy raw data

Unable to read messy raw data

 

libname Lesson08 '/home/coccus030/sasuser.v94/';
Data Lesson08.college09ds;
Infile '/home/coccus030/sasuser.v94/institutiondata.txt' firstobs=2;
Input GenderType & $5. Year 7-12 @ 'music:' Asofdate mmddyy10.;
run;

 

Column style.  I copy and paste from text.

 

Variable:GenderType Year AsofDate

icc-M     1970       music:  03/14/2017          ipp-e     1989       music:  03/19/2018

iff-e       1880       music:  03/15/2017           ioo-p     1996       music:  03/10/2015

odd-w    2001       music: 03/16/2011           odd-p    1965       music: 03/01/2011

egg-o     2000       music: 03/17/2010           eee-w   1947       music: 03/04/2016

PROC Star
Posts: 1,287

Re: unable to read messy raw data

data have;
Input GenderType $ Year @'music:'+2 Asofdate mmddyy10. @@;
format Asofdate mmddyy10. ;
datalines;
icc-M     1970       music:  03/14/2017          ipp-e     1989       music:  03/19/2018
iff-e       1880       music:  03/15/2017           ioo-p     1996       music:  03/10/2015
odd-w    2001       music: 03/16/2011           odd-p    1965       music: 03/01/2011
egg-o     2000       music: 03/17/2010           eee-w   1947       music: 03/04/2016
;
Super User
Posts: 12,994

Re: unable to read messy raw data

Do not expect text data pasted into a message window on this forum to match your actual text data as the software reformats it.

Better is to paste text data into a code box opened using the forums {I} menu icon.

 

I suspect that your actual data has TAB characters in it. Therefor the YEAR is likely not actually in columns 7-12. consistently

When I pasted your example data into in a text editor I get this:

icc-M     1970       music:  03/14/2017          ipp-e     1989       music:  03/19/2018
iff-e       1880       music:  03/15/2017           ioo-p     1996       music:  03/10/2015
odd-w    2001       music: 03/16/2011           odd-p    1965       music: 03/01/2011
egg-o     2000       music: 03/17/2010           eee-w   1947       music: 03/04/2016

which shows the year appearing in different columns. Which will happen with tab replacements.

 

It might help to actually attach the data file as a TXT file so we can verify the actual contents.

 

BTW That does NOT qualify as messy data by a long shot. But it will require understanding of what happens when you read from 1) a fixed column that does not align with your actual data, 2) the actual location of where the column pointer when reading using the @'music:' column pointer and how it interacts with a format specified on the input statement.

 

You should also post in a code box the very likely invalid data messages with your code from the log.

Ask a Question
Discussion stats
  • 2 replies
  • 82 views
  • 0 likes
  • 3 in conversation