I have very large CSV files with 100+ variables that look something like this:
There are empty columns, and often, fields with long string such as addresses, business names, etc will have a line break interspersed throughout the files. Aside from deleting these problem columns, I can't figure out a good way to import these files completely. I've tried adding TERMSTR=CRLF but it doesn't help and causes even worse formatting.
Is there a better way to import and/or to clean up these line breaks/spaces?
id,EntryDate,FirstName,LastName,Status1,Status2,Location,OtherAddress"8","5/31/2013","First Name 1","Last Name 1","Below Status1 Level","","ABC D clinic","" "9","6/22/2014","First Name 2","Last Name 2","Below Status1 Level","","AB CD Residency Clinic","""33","9/3/2015","First Name 3","Last Name 3","Below Status1 Level","","","","100 Random St City, State","
For example:
data WORK.WEIRDFORMAT ; %let _EFIERR_ = 0; /* set the ERROR detection macro variable */ infile 'weirdformat.csv' delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2 ; informat id $24. ; informat EntryDate $11. ; informat FirstName $14. ; informat LastName $13. ; informat Status1 $21. ; informat Status2 $2. ; informat Location $2. ; informat OtherAddress $2. ; format id $24. ; format EntryDate $11. ; format FirstName $14. ; format LastName $13. ; format Status1 $21. ; format Status2 $2. ; format Location $2. ; format OtherAddress $2. ; input id $ EntryDate $ FirstName $ LastName $ Status1 $ Status2 $ Location $ OtherAddress $ ; if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */ run;
... View more