BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Andygray
Quartz | Level 8

Hi Experts,

i'd like to understand the difference between DSD and DLM=',' options in the infile statement. Although, there is enough documentation, long sentences are sometimes not precise in understanding the functionality. What i exactly want to know is that, it seems reading comma separated values with a DSD can incorrectly produce all missing values when reading a numeric variable or in other words when not used correctly.  Also, where and when you would choose to use either dlm= or dsd. And most importantly, i have also noticed some using both in the same infile statement. I wanna know on what circumstances would DSD read incorrectly, and the same is read correctly by dlm=.

Please pardon me for asking basic question but if you don't mind the time to offer an easy answer with a simple/medium complex example, i'd most appreciate it. Thanks!!

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

As basic as this is, it's something I also happen to fall over from time to time. As a rule of thumb I would say either assign the variable a permanent informat (using a informat or attrib statement) or use the column modifier.

From SAS(R) 9.4 Statements: Reference, Third Edition

Comparisons

How Modified List Input and Formatted Input Differ

Modified list input has a scanning feature that can use informats to read data which are not aligned in columns. Formatted input causes the pointer to move like that of column input to read a variable value. The pointer moves the length that is specified in the informat and stops at the next column.

Reading above for formatted input "input cust_id ... ddmmyy10. " should only be reading max. 10 columns, so even if "The pointer moves the length that is specified in the informat" supersedes the delimiter definition, the length of "10" should actually stop before the next delimiter of "comma" is in the string.


I haven't found a really satisfying explanation in the docu which explains such behavior in a way that I feel I fully understand it.


data jansales1;

  infile datalines truncover dsd;

  input item  $10. amount comma5.;

  datalines;

trucks 1,382

vans 1,235

sedans 2,391

;

run;

data jansales2;

  infile datalines truncover dsd;

  input item : $10. amount comma5.;

  datalines;

trucks 1,382

vans 1,235

sedans 2,391

;

run;

data jansales3;

  infile datalines truncover dsd;

  informat item $10. amount comma5.;

  input item  amount;

  datalines;

trucks 1,382

vans 1,235

sedans 2,391

;

run;

View solution in original post

10 REPLIES 10
Haikuo
Onyx | Level 15

DSD does 3 things:

1. dlm=','

2. if there are two comma in a row, it sees a missing value.

3. if there are quotes around data elements, it removes them.

dlm is just dlm. When using dlm= and dsd together, the value defined by dlm= overrides comma.

Ksharp
Super User

4. If there is a delimiter ( comma )  in the quote , SAS will ignore it . E.X.

"xx,yy",zz 

SAS will take it as two variables not three.

So as far as I know they are not different . You can remove delimiter=','  when you are using DSD .

Xia Keshan

ballardw
Super User

DLM or the delimiter to specify what separates fields is a key piece in reading many data sets, not just comma separated. So the option lets you lead files where fields are separated by tabs, | , : and just about any other character you may want. You can also used multiple delimiters for a single file such as dlm = ',|' which would tread any comma or | not  as the start of a new field.

DSD is critical when you have text entered that may include commas such as "This is a comment, that contains a comma"

without the DSD option this could be treated as two fields containing "This is a comment and that contains a comma"

It sounds like you have a concern about specific data, possibly numeric values with commas such as 123,345,789 if so there are other things that may be needed.

Andygray
Quartz | Level 8

Ok here is an example, that is getting me rather confused:

/*this code reads trxn numeric values correctly*/

data one;

infile datalines dlm=',' ; /*using dlm','*/

input cust_id   date ddmmyy10. trxn nat $;

cards;

1,10/01/2015,5000,dr

1,13/01/2015,2000,cr

1,15/01/2015,2000,dr

1,18/01/2015,2000,dr

1,21/01/2015,2000,cr

1,23/01/2015,2000,cr

1,24/01/2015,2000,cr

1,26/01/2015,2000,dr

1,28/01/2015,2000,dr

1,31/01/2015,2000,dr

;

run;

whereas the following code below with DSD incorrectly reads trxn and outputs all missing values

data one;

infile datalines dsd;

input cust_id   date ddmmyy10. trxn nat $;

cards;

1,10/01/2015,5000,dr

1,13/01/2015,2000,cr

1,15/01/2015,2000,dr

1,18/01/2015,2000,dr

1,21/01/2015,2000,cr

1,23/01/2015,2000,cr

1,24/01/2015,2000,cr

1,26/01/2015,2000,dr

1,28/01/2015,2000,dr

1,31/01/2015,2000,dr

;

run;

So, where did i go wrong in my understanding?Please help. I am using SAS EG 6.1

Patrick
Opal | Level 21

If you're using a proper INformat then things will work (note the colon in front of the date informat).

data one;

  infile datalines dsd;

  input cust_id date :ddmmyy10. trxn nat $;

  format date ddmmyy10.;

  cards;

1,10/01/2015,5000,dr

1,13/01/2015,2000,cr

1,15/01/2015,2000,dr

1,18/01/2015,2000,dr

1,21/01/2015,2000,cr

1,23/01/2015,2000,cr

1,24/01/2015,2000,cr

1,26/01/2015,2000,dr

1,28/01/2015,2000,dr

1,31/01/2015,2000,dr

;

run;

Andygray
Quartz | Level 8

My oh My!, missing out that one : in front of the informat was the issue and made the difference? dear o dear, jeez, what does : do here and it's functionality in general? Does that mean I will need to specify : before an informat every single time I use dsd option in the infile statement?

Patrick
Opal | Level 21

As basic as this is, it's something I also happen to fall over from time to time. As a rule of thumb I would say either assign the variable a permanent informat (using a informat or attrib statement) or use the column modifier.

From SAS(R) 9.4 Statements: Reference, Third Edition

Comparisons

How Modified List Input and Formatted Input Differ

Modified list input has a scanning feature that can use informats to read data which are not aligned in columns. Formatted input causes the pointer to move like that of column input to read a variable value. The pointer moves the length that is specified in the informat and stops at the next column.

Reading above for formatted input "input cust_id ... ddmmyy10. " should only be reading max. 10 columns, so even if "The pointer moves the length that is specified in the informat" supersedes the delimiter definition, the length of "10" should actually stop before the next delimiter of "comma" is in the string.


I haven't found a really satisfying explanation in the docu which explains such behavior in a way that I feel I fully understand it.


data jansales1;

  infile datalines truncover dsd;

  input item  $10. amount comma5.;

  datalines;

trucks 1,382

vans 1,235

sedans 2,391

;

run;

data jansales2;

  infile datalines truncover dsd;

  input item : $10. amount comma5.;

  datalines;

trucks 1,382

vans 1,235

sedans 2,391

;

run;

data jansales3;

  infile datalines truncover dsd;

  informat item $10. amount comma5.;

  input item  amount;

  datalines;

trucks 1,382

vans 1,235

sedans 2,391

;

run;

Tom
Super User Tom
Super User

The difference between DSD and non-DSD that is causing the trouble with your example program is how it moves the cursor after reading a field.  Without DSD it moves past all of the delimiters that it sees before reading the next field.  With DSD it does not.  I assume this is what allows it to treat adjacent delimiters as indicating a missing value.

See this example:

data one;

infile cards dlm=',' TRUNCOVER ; /*using dlm','*/

input cust_id   date ddmmyy10. A $ B $ C $;

cards;

1,10/01/2015,5000,dr

;

run;

data two;

infile cards dsd TRUNCOVER ;

input cust_id   date ddmmyy10. A $ B $ C $;

cards;

1,10/01/2015,5000,dr

;

run;


proc compare data=one compare=two;

run;

Lin_Clare
Calcite | Level 5

Hi Tom,

I am here to learn. I tried your code and get table one with varaible C as missing, while table two with variable A as mssing, but still not understand why.

Could you explain how the cursor move between input date and A in these dlm=',' and dsd method?

Thanks.

Tom
Super User Tom
Super User

The issue is when it starts to read A the cursor is pointing at the comma (column 13) in both cases.

With the DSD option SAS takes this to mean that there is a missing value before that comma. Without the DSD option it will skip over the comma (the same way that normal list input where space is the delimiter would skip over extra spaces in a line) and so the character after the comma is read as the value of A.

You can see the same behaviour by reading a line that starts with the delimiter.

 

data one ;
   infile cards dlm=',' ;
   input x $ ;
   put x= ;
cards;
,AAA
;;;;;
data two;
   infile cards dsd dlm=',';
   input x $ ;
   put x=;
cards;
,AAA
;;;;

 

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 79012 views
  • 26 likes
  • 7 in conversation