Hi Experts,
i'd like to understand the difference between DSD and DLM=',' options in the infile statement. Although, there is enough documentation, long sentences are sometimes not precise in understanding the functionality. What i exactly want to know is that, it seems reading comma separated values with a DSD can incorrectly produce all missing values when reading a numeric variable or in other words when not used correctly. Also, where and when you would choose to use either dlm= or dsd. And most importantly, i have also noticed some using both in the same infile statement. I wanna know on what circumstances would DSD read incorrectly, and the same is read correctly by dlm=.
Please pardon me for asking basic question but if you don't mind the time to offer an easy answer with a simple/medium complex example, i'd most appreciate it. Thanks!!
As basic as this is, it's something I also happen to fall over from time to time. As a rule of thumb I would say either assign the variable a permanent informat (using a informat or attrib statement) or use the column modifier.
From SAS(R) 9.4 Statements: Reference, Third Edition
Reading above for formatted input "input cust_id ... ddmmyy10. " should only be reading max. 10 columns, so even if "The pointer moves the length that is specified in the informat" supersedes the delimiter definition, the length of "10" should actually stop before the next delimiter of "comma" is in the string.
I haven't found a really satisfying explanation in the docu which explains such behavior in a way that I feel I fully understand it.
data jansales1;
infile datalines truncover dsd;
input item $10. amount comma5.;
datalines;
trucks 1,382
vans 1,235
sedans 2,391
;
run;
data jansales2;
infile datalines truncover dsd;
input item : $10. amount comma5.;
datalines;
trucks 1,382
vans 1,235
sedans 2,391
;
run;
data jansales3;
infile datalines truncover dsd;
informat item $10. amount comma5.;
input item amount;
datalines;
trucks 1,382
vans 1,235
sedans 2,391
;
run;
DSD does 3 things:
1. dlm=','
2. if there are two comma in a row, it sees a missing value.
3. if there are quotes around data elements, it removes them.
dlm is just dlm. When using dlm= and dsd together, the value defined by dlm= overrides comma.
4. If there is a delimiter ( comma ) in the quote , SAS will ignore it . E.X.
"xx,yy",zz
SAS will take it as two variables not three.
So as far as I know they are not different . You can remove delimiter=',' when you are using DSD .
Xia Keshan
DLM or the delimiter to specify what separates fields is a key piece in reading many data sets, not just comma separated. So the option lets you lead files where fields are separated by tabs, | , : and just about any other character you may want. You can also used multiple delimiters for a single file such as dlm = ',|' which would tread any comma or | not as the start of a new field.
DSD is critical when you have text entered that may include commas such as "This is a comment, that contains a comma"
without the DSD option this could be treated as two fields containing "This is a comment and that contains a comma"
It sounds like you have a concern about specific data, possibly numeric values with commas such as 123,345,789 if so there are other things that may be needed.
Ok here is an example, that is getting me rather confused:
/*this code reads trxn numeric values correctly*/
data one;
infile datalines dlm=',' ; /*using dlm','*/
input cust_id date ddmmyy10. trxn nat $;
cards;
1,10/01/2015,5000,dr
1,13/01/2015,2000,cr
1,15/01/2015,2000,dr
1,18/01/2015,2000,dr
1,21/01/2015,2000,cr
1,23/01/2015,2000,cr
1,24/01/2015,2000,cr
1,26/01/2015,2000,dr
1,28/01/2015,2000,dr
1,31/01/2015,2000,dr
;
run;
whereas the following code below with DSD incorrectly reads trxn and outputs all missing values
data one;
infile datalines dsd;
input cust_id date ddmmyy10. trxn nat $;
cards;
1,10/01/2015,5000,dr
1,13/01/2015,2000,cr
1,15/01/2015,2000,dr
1,18/01/2015,2000,dr
1,21/01/2015,2000,cr
1,23/01/2015,2000,cr
1,24/01/2015,2000,cr
1,26/01/2015,2000,dr
1,28/01/2015,2000,dr
1,31/01/2015,2000,dr
;
run;
So, where did i go wrong in my understanding?Please help. I am using SAS EG 6.1
If you're using a proper INformat then things will work (note the colon in front of the date informat).
data one;
infile datalines dsd;
input cust_id date :ddmmyy10. trxn nat $;
format date ddmmyy10.;
cards;
1,10/01/2015,5000,dr
1,13/01/2015,2000,cr
1,15/01/2015,2000,dr
1,18/01/2015,2000,dr
1,21/01/2015,2000,cr
1,23/01/2015,2000,cr
1,24/01/2015,2000,cr
1,26/01/2015,2000,dr
1,28/01/2015,2000,dr
1,31/01/2015,2000,dr
;
run;
My oh My!, missing out that one : in front of the informat was the issue and made the difference? dear o dear, jeez, what does : do here and it's functionality in general? Does that mean I will need to specify : before an informat every single time I use dsd option in the infile statement?
As basic as this is, it's something I also happen to fall over from time to time. As a rule of thumb I would say either assign the variable a permanent informat (using a informat or attrib statement) or use the column modifier.
From SAS(R) 9.4 Statements: Reference, Third Edition
Reading above for formatted input "input cust_id ... ddmmyy10. " should only be reading max. 10 columns, so even if "The pointer moves the length that is specified in the informat" supersedes the delimiter definition, the length of "10" should actually stop before the next delimiter of "comma" is in the string.
I haven't found a really satisfying explanation in the docu which explains such behavior in a way that I feel I fully understand it.
data jansales1;
infile datalines truncover dsd;
input item $10. amount comma5.;
datalines;
trucks 1,382
vans 1,235
sedans 2,391
;
run;
data jansales2;
infile datalines truncover dsd;
input item : $10. amount comma5.;
datalines;
trucks 1,382
vans 1,235
sedans 2,391
;
run;
data jansales3;
infile datalines truncover dsd;
informat item $10. amount comma5.;
input item amount;
datalines;
trucks 1,382
vans 1,235
sedans 2,391
;
run;
The difference between DSD and non-DSD that is causing the trouble with your example program is how it moves the cursor after reading a field. Without DSD it moves past all of the delimiters that it sees before reading the next field. With DSD it does not. I assume this is what allows it to treat adjacent delimiters as indicating a missing value.
See this example:
data one;
infile cards dlm=',' TRUNCOVER ; /*using dlm','*/
input cust_id date ddmmyy10. A $ B $ C $;
cards;
1,10/01/2015,5000,dr
;
run;
data two;
infile cards dsd TRUNCOVER ;
input cust_id date ddmmyy10. A $ B $ C $;
cards;
1,10/01/2015,5000,dr
;
run;
proc compare data=one compare=two;
run;
Hi Tom,
I am here to learn. I tried your code and get table one with varaible C as missing, while table two with variable A as mssing, but still not understand why.
Could you explain how the cursor move between input date and A in these dlm=',' and dsd method?
Thanks.
The issue is when it starts to read A the cursor is pointing at the comma (column 13) in both cases.
With the DSD option SAS takes this to mean that there is a missing value before that comma. Without the DSD option it will skip over the comma (the same way that normal list input where space is the delimiter would skip over extra spaces in a line) and so the character after the comma is read as the value of A.
You can see the same behaviour by reading a line that starts with the delimiter.
data one ;
infile cards dlm=',' ;
input x $ ;
put x= ;
cards;
,AAA
;;;;;
data two;
infile cards dsd dlm=',';
input x $ ;
put x=;
cards;
,AAA
;;;;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.