- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I need to import a text file in csv format with text qualifiers. I tried PROC IMPORT but it didn't recognize the comma(,) as a delimiter in the file.
Can anyone help me with the code for sas import that will take care of both the comma (as delimited) and text qualifier? Please note my data has " " where there is a missing variable. I will highly appreciate any help.
Here is how my data looks like:
KeyInd,Addr1,city,ID1,ID2, Name, Country
Individual,"42nd Street","New York",09000033,04531-027-0031813," ", "United States"
Individual,"9 Glen Echo Crt",Buffalo,04531-079-0035670,04531-027-0031819, "Joe Smith"," "
Individual,"9 Glen Echo Crt",Buffalo,04531-079-0035670,04531-027-0031819, "Joe Smith",Canada
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Why would you use "import" for such a simple data file?
Just write the data step to read it. SAS will automatically treat blanks as missing. Just pick reasonable values for the lengths of your variables. You can always check the data after you have read it and adjust the program and re-run it.
Note that none of your variables look like numbers to me.
data want;
infile datalines dsd firstobs=2 truncover;
length KeyInd $20 Addr1 $100 city $40 ID1 ID2 $20 Name $40 Country $40 ;
input KeyInd--Country;
datalines4;
KeyInd,Addr1,city,ID1,ID2, Name, Country
Individual,"42nd Street","New York",09000033,04531-027-0031813," ", "United States"
Individual,"9 Glen Echo Crt",Buffalo,04531-079-0035670,04531-027-0031819, "Joe Smith"," "
Individual,"9 Glen Echo Crt",Buffalo,04531-079-0035670,04531-027-0031819, "Joe Smith",Canada
;;;;
To read from a file instead replace the DATALINES keyword in the INFILE statement with the reference (simplest is quoted physical name of the file) to your source file and replace the in-line data with a RUN statement.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for your help, but I have 1000+ variables. So I really can't define them in data step, I mean it will be too much. Is there a easier way?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@mlogan wrote:
Hi Tom,
Thanks for your help, but I have 1000+ variables. So I really can't define them in data step, I mean it will be too much. Is there a easier way?
??
Your original post shows just 6 (six) columns.
More than 1000 columns points to a massive failure in process design, BTW.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
More than 1,000 variables? That seems excessive.
Do you have data about the variables (metadata)?
If so then use the metadata to write the program to read the text file.
If not then you will not have much hope of making heads or tails of 1,000+ variables.
Also more than 1,000 variables means that you are in danger of having more lines longer than 32K, which might cause trouble for PROC IMPORT. Especially if the header row is too long such that some of the headers are not read.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You could use the import wizard (SAS EG or Studio) which will generate the data step for you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
do you have the sample of the SAS code you are using?
dlm=',';
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hope you tried the below code
proc import datafile='path' dbms=csv replace out=want dlm=',';getnames=yes;
run;
Jag
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
dlm=',' gives an error, use instead:
; delimiter = ','
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
With PROC IMPORT you have to use the DELIMITER= statement to change the delimiter.
But if you use DBMS=CSV then the delimiter will already default to a comma so you don't need the DELIMITER= statement.
They were probably thinking of the DLM= option on the INFILE statement for when you write your own data step to read the text file. But again if you use the DSD option on the INFILE statement then the default delimiter is a comma so you don't have to add the DLM= option there either.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Open the program you are using to import the CSV file (such as Microsoft Excel, Google Sheets, or a database program).
Go to the "File" menu and choose "Import" or "Open."
In the import wizard or dialog box, select "CSV" as the file type.
Choose the file you want to import and click "Next" or "Continue."
In the "Text Qualifier" field, enter the character that you want to use as the text qualifier. The default text qualifier is double quotes (").
Preview the data and make sure that the fields are being correctly separated and that the text qualifiers are being used to encapsulate text fields with special characters.
Complete the import process and save the data in your desired format.
Regards,
Rachel Gomez
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Opening and saving a text file with something like Excel will in essence destroy it. The software will have its own ideas (like converting the code 1.1 into a date) about the content.
There's only one way to deal with undocumented text files: open with a text editor, and write the code according to what you see, and what you can make of it out of your experience.
If the file is documented (as it should be), you write the code according to the doc.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I used data step to import with the following option, this resolved the issue with unwanted quotes I got for the last column. Hope this helps. Thanks.
TERMSTR= option—Specifies what end-of-line character to use for a file. This option is specific to ASCII
operating systems and is documented in the SAS 9.2 companion for your operating system. This option is useful
when you want to share data files that are created on one operating system with another operating system. For
example, if you are working in a UNIX environment and you need to read a file that was created under Windows,
use TERMSTR=CRLF. Similarly, if you are in a Windows environment and you need to read a file that was
created under UNIX, use TERMSTR=LF.
The following INFILE statement illustrates reading a MAC file from a PC:
infile 'C:\mydata\test.dat' dsd dlm='|' termstr=cr;