DATA Step, Macro, Functions and more

CSV Input when a Variable Contains a Delimiter

New User
Posts: 1

CSV Input when a Variable Contains a Delimiter

Good Afternoon,


I've been working my way thru Cody's Learning SAS by Example.  Problem 15 in Chapter 3 is giving me some trouble.



The input file contains:


123,"Harold Wilson", Acct,01/15/1989,$78,123.
128,"Julia Child",Food, 08/29/1988,$89,123
007,"James Bond",Security,02/01/2000,$82,100
828,"Roger Doger",Acct,08/15/1999,$39,100
900,"Earl Davenport",Food,09/09/1989,$45,399
906,"James Swindler",Acct,12/21/1978,$78,200



The input is comma delimited, but the last field, SALARY, contains a comma. 


When I run the following script, SALARY is truncated to the portion before its embedded comma.  Section 3.14 in the book discusses using the ampersand "&" modifier, but that involves altering the input.


I could read SALARY in pieces (i.e., before and after the comma), do some arithmetic, and re-assemble.  I could alter the input so that SALARY is enclosed in quotes.  Seems like there should be a more straight forward solution.



Any ideas?  Thanks in advance  SAS release: 9.04.01M4P11092016


data EMPLOY ;
    infile '/home/mytest/sasuser.v94/cody/Employee1.csv'  dsd missover;
    informat EmpID  $3.
                EmpName $20.
                Dept        $8.
                DateHire  mmddyy10.
                Salary    dollar8.;
    input EmpID          
            Salary      ;


Trusted Advisor
Posts: 1,848

Re: CSV Input when a Variable Contains a Delimiter

Read the whole line as one variable and use scan function to put values into first 4 variables,

then use index function to point at the $ signe, substring the rest of line and input value to salary 

using best informat or comma informat.


Try make your code and come back in case of any issue.

Super User
Super User
Posts: 8,279

Re: CSV Input when a Variable Contains a Delimiter

[ Edited ]

Your CSV file is malformed. If the value has a delimiter (or a quote) in it then it should be quoted. If this is a real data file and they are sending you more of them tell them to fix their process so that it generates a validly formatted file.


Fortunately the problem is in the last field in the line, so you can work around the error.   Since it is the last field you can use list mode input to read the beginning columns and then use formatted mode for the last column. 


NOTE: This is an example where using the obsolete MISSOVER option on the INFILE statement instead of the newer (it is probably only 30 years old) TRUNCOVER option could cause a real problem.  If you tell SAS to read 8 characters for SALARY and the line only has 7 characters left then with MISSOVER you will get a missing value, but with TRUNCOVER SAS will just use the number of characters that are available.


data EMPLOY ;
  infile '/home/mytest/sasuser.v94/cody/Employee1.csv' dsd truncover;
    EmpID $3
    EmpName $20
    Dept $8
    DateHire 8
    Salary 8
  input empid empname dept datehire salary comma32. ;
  informat datehire mmddyy. ;
  format datehire yymmdd10. salary dollar12.;

Also don't define your variables by relying on the side effect of using an INFORMAT statement as the first place you reference the variable. Go ahead and explicitly define them using a LENGTH or ATTRIB statement. Save the INFORMAT statement for when you actually need to attach an INFORMAT to a variable. Note that SAS does not need informats (or formats) for most variable, especially character variables.



Ask a Question
Discussion stats
  • 2 replies
  • 3 in conversation