- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi Guys,
Need some help please!
I am trying to import a .dat file into SAS using infile.
The data is pipe(|) delimited. However some of the data is like the following: a|b|c|d|"Mr John|Smith"|e|f|. As you can see the delimiter is embedded in between two data columns but in between double quotes (there are some with single quotes too). How can I get around this so that the delimiter is recognised and the data is imported correctly?
Example of my data step:
data library.file;
infile "filepath" DLM = '|' DSD MISSOVER firstobs=3;
format
var1 $99.
var2 $99.
var3 $99.
var4 $99.
firstname $99.
surname $99.
var5 $99.
var6 $99.
;
input
var1 $
var2 $
var3 $
var4 $
firstname $
surname $
var5 $
var6 $
;
run;
Thank you!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If you want want to treat the adjacent delimiters as NOT meaning one word boundary (like spaces used to align columns) then you will need to modify the line in some way. Either remove (or replace) the quotes
infile xx dsd dlm='|' ;
input @;
_infile_ = translate(_infile_,' ','"');
input ...;
or introduce something between the delimiters.
infile xx dlm='|' ;
input @;
_infile_ = tranwrd(_infile_,'||','| |');
input ...;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The purpose of quotes in delimited files it to protect the delimiters when it appears in the the value of one of the fields. The DSD option is that is enabling the dataset to parse the line.
So in a normal interpretation of the string:
a|b|c|d|"Mr John|Smith"|e|f|.
There are 8 values there:
a b c d Mr John|Smith e f .
How do you want that line to be parsed instead?
There a some combinations of strings that can confuse SAS's parsing of lines. Since SAS supports either single or double quotes around string the normal input process does also. So these two lines are the same to SAS.
a|'hi|there'|c a|"hi|there"|c
If you have messy data where there exists on the same line a value that starts with a single quote and another that ends in a single quote you could have problems. Like this:
a|b|c|d a|'hi|c|x'
Is that your situation?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Yes, you are correct.
I would like my data to be parsed like this:
a
b
c
d
Mr John
Smith
e
f
I would ideally like that the double or single quotes don't hinder the delimiter.
Would you know how to go around this please?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Remove the DSD option.
But the quotes will be part of the values.
a b c d "Mr John Smith" e f
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Yeah, I did try that, but some of my data is empty/missing in between delimiters so it would simply ignore it which causes further issues 😞
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Is it possible to fix the problem at the source? The values with the quotes should have been quoted.
a|b|c|d|"""Mr John"|"Smith"""|e|f|