Please explain DSD and DLM=',' differences?

Accepted Solution Solved
Reply
Contributor
Posts: 56
Accepted Solution

Please explain DSD and DLM=',' differences?

Hi Experts,

i'd like to understand the difference between DSD and DLM=',' options in the infile statement. Although, there is enough documentation, long sentences are sometimes not precise in understanding the functionality. What i exactly want to know is that, it seems reading comma separated values with a DSD can incorrectly produce all missing values when reading a numeric variable or in other words when not used correctly.  Also, where and when you would choose to use either dlm= or dsd. And most importantly, i have also noticed some using both in the same infile statement. I wanna know on what circumstances would DSD read incorrectly, and the same is read correctly by dlm=.

Please pardon me for asking basic question but if you don't mind the time to offer an easy answer with a simple/medium complex example, i'd most appreciate it. Thanks!!


Accepted Solutions
Solution
‎01-31-2015 08:07 PM
Respected Advisor
Posts: 3,837

Re: Please explain DSD and DLM=',' differences?

As basic as this is, it's something I also happen to fall over from time to time. As a rule of thumb I would say either assign the variable a permanent informat (using a informat or attrib statement) or use the column modifier.

From SAS(R) 9.4 Statements: Reference, Third Edition

Comparisons

How Modified List Input and Formatted Input Differ

Modified list input has a scanning feature that can use informats to read data which are not aligned in columns. Formatted input causes the pointer to move like that of column input to read a variable value. The pointer moves the length that is specified in the informat and stops at the next column.

Reading above for formatted input "input cust_id ... ddmmyy10. " should only be reading max. 10 columns, so even if "The pointer moves the length that is specified in the informat" supersedes the delimiter definition, the length of "10" should actually stop before the next delimiter of "comma" is in the string.


I haven't found a really satisfying explanation in the docu which explains such behavior in a way that I feel I fully understand it.


data jansales1;

  infile datalines truncover dsd;

  input item  $10. amount comma5.;

  datalines;

trucks 1,382

vans 1,235

sedans 2,391

;

run;

data jansales2;

  infile datalines truncover dsd;

  input item : $10. amount comma5.;

  datalines;

trucks 1,382

vans 1,235

sedans 2,391

;

run;

data jansales3;

  infile datalines truncover dsd;

  informat item $10. amount comma5.;

  input item  amount;

  datalines;

trucks 1,382

vans 1,235

sedans 2,391

;

run;

View solution in original post


All Replies
Respected Advisor
Posts: 3,124

Re: Please explain DSD and DLM=',' differences?

DSD does 3 things:

1. dlm=','

2. if there are two comma in a row, it sees a missing value.

3. if there are quotes around data elements, it removes them.

dlm is just dlm. When using dlm= and dsd together, the value defined by dlm= overrides comma.

Grand Advisor
Posts: 9,584

Re: Please explain DSD and DLM=',' differences?

4. If there is a delimiter ( comma )  in the quote , SAS will ignore it . E.X.

"xx,yy",zz 

SAS will take it as two variables not three.

So as far as I know they are not different . You can remove delimiter=','  when you are using DSD .

Xia Keshan

Grand Advisor
Posts: 10,223

Re: Please explain DSD and DLM=',' differences?

DLM or the delimiter to specify what separates fields is a key piece in reading many data sets, not just comma separated. So the option lets you lead files where fields are separated by tabs, | , : and just about any other character you may want. You can also used multiple delimiters for a single file such as dlm = ',|' which would tread any comma or | not  as the start of a new field.

DSD is critical when you have text entered that may include commas such as "This is a comment, that contains a comma"

without the DSD option this could be treated as two fields containing "This is a comment and that contains a comma"

It sounds like you have a concern about specific data, possibly numeric values with commas such as 123,345,789 if so there are other things that may be needed.

Contributor
Posts: 56

Re: Please explain DSD and DLM=',' differences?

Ok here is an example, that is getting me rather confused:

/*this code reads trxn numeric values correctly*/

data one;

infile datalines dlm=',' ; /*using dlm','*/

input cust_id   date ddmmyy10. trxn nat $;

cards;

1,10/01/2015,5000,dr

1,13/01/2015,2000,cr

1,15/01/2015,2000,dr

1,18/01/2015,2000,dr

1,21/01/2015,2000,cr

1,23/01/2015,2000,cr

1,24/01/2015,2000,cr

1,26/01/2015,2000,dr

1,28/01/2015,2000,dr

1,31/01/2015,2000,dr

;

run;

whereas the following code below with DSD incorrectly reads trxn and outputs all missing values

data one;

infile datalines dsd;

input cust_id   date ddmmyy10. trxn nat $;

cards;

1,10/01/2015,5000,dr

1,13/01/2015,2000,cr

1,15/01/2015,2000,dr

1,18/01/2015,2000,dr

1,21/01/2015,2000,cr

1,23/01/2015,2000,cr

1,24/01/2015,2000,cr

1,26/01/2015,2000,dr

1,28/01/2015,2000,dr

1,31/01/2015,2000,dr

;

run;

So, where did i go wrong in my understanding?Please help. I am using SAS EG 6.1

Respected Advisor
Posts: 3,837

Re: Please explain DSD and DLM=',' differences?

If you're using a proper INformat then things will work (note the colon in front of the date informat).

data one;

  infile datalines dsd;

  input cust_id date :ddmmyy10. trxn nat $;

  format date ddmmyy10.;

  cards;

1,10/01/2015,5000,dr

1,13/01/2015,2000,cr

1,15/01/2015,2000,dr

1,18/01/2015,2000,dr

1,21/01/2015,2000,cr

1,23/01/2015,2000,cr

1,24/01/2015,2000,cr

1,26/01/2015,2000,dr

1,28/01/2015,2000,dr

1,31/01/2015,2000,dr

;

run;

Contributor
Posts: 56

Re: Please explain DSD and DLM=',' differences?

My oh My!, missing out that one : in front of the informat was the issue and made the difference? dear o dear, jeez, what does : do here and it's functionality in general? Does that mean I will need to specify : before an informat every single time I use dsd option in the infile statement?

Solution
‎01-31-2015 08:07 PM
Respected Advisor
Posts: 3,837

Re: Please explain DSD and DLM=',' differences?

As basic as this is, it's something I also happen to fall over from time to time. As a rule of thumb I would say either assign the variable a permanent informat (using a informat or attrib statement) or use the column modifier.

From SAS(R) 9.4 Statements: Reference, Third Edition

Comparisons

How Modified List Input and Formatted Input Differ

Modified list input has a scanning feature that can use informats to read data which are not aligned in columns. Formatted input causes the pointer to move like that of column input to read a variable value. The pointer moves the length that is specified in the informat and stops at the next column.

Reading above for formatted input "input cust_id ... ddmmyy10. " should only be reading max. 10 columns, so even if "The pointer moves the length that is specified in the informat" supersedes the delimiter definition, the length of "10" should actually stop before the next delimiter of "comma" is in the string.


I haven't found a really satisfying explanation in the docu which explains such behavior in a way that I feel I fully understand it.


data jansales1;

  infile datalines truncover dsd;

  input item  $10. amount comma5.;

  datalines;

trucks 1,382

vans 1,235

sedans 2,391

;

run;

data jansales2;

  infile datalines truncover dsd;

  input item : $10. amount comma5.;

  datalines;

trucks 1,382

vans 1,235

sedans 2,391

;

run;

data jansales3;

  infile datalines truncover dsd;

  informat item $10. amount comma5.;

  input item  amount;

  datalines;

trucks 1,382

vans 1,235

sedans 2,391

;

run;

Super User
Super User
Posts: 6,364

Re: Please explain DSD and DLM=',' differences?

The difference between DSD and non-DSD that is causing the trouble with your example program is how it moves the cursor after reading a field.  Without DSD it moves past all of the delimiters that it sees before reading the next field.  With DSD it does not.  I assume this is what allows it to treat adjacent delimiters as indicating a missing value.

See this example:

data one;

infile cards dlm=',' TRUNCOVER ; /*using dlm','*/

input cust_id   date ddmmyy10. A $ B $ C $;

cards;

1,10/01/2015,5000,dr

;

run;

data two;

infile cards dsd TRUNCOVER ;

input cust_id   date ddmmyy10. A $ B $ C $;

cards;

1,10/01/2015,5000,dr

;

run;


proc compare data=one compare=two;

run;

Occasional Contributor
Posts: 14

Re: Please explain DSD and DLM=',' differences?

Hi Tom,

I am here to learn. I tried your code and get table one with varaible C as missing, while table two with variable A as mssing, but still not understand why.

Could you explain how the cursor move between input date and A in these dlm=',' and dsd method?

Thanks.

Super User
Super User
Posts: 6,364

Re: Please explain DSD and DLM=',' differences?

[ Edited ]

The issue is when it starts to read A the cursor is pointing at the comma (column 13) in both cases.

With the DSD option SAS takes this to mean that there is a missing value before that comma. Without the DSD option it will skip over the comma (the same way that normal list input where space is the delimiter would skip over extra spaces in a line) and so the character after the comma is read as the value of A.

You can see the same behaviour by reading a line that starts with the delimiter.

 

data one ;
   infile cards dlm=',' ;
   input x $ ;
   put x= ;
cards;
,AAA
;;;;;
data two;
   infile cards dsd dlm=',';
   input x $ ;
   put x=;
cards;
,AAA
;;;;

 

 

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 10 replies
  • 20514 views
  • 13 likes
  • 7 in conversation