Solved: reading a csv file starting from 4th row using proc import

Tal · Posted 09-29-2015 09:54 AM

i have a csv file and data starts from 4th road (3rd road is variable names).

Is there a way i can import it using proc import?

RW9 · Posted 09-29-2015 10:22 AM

Its not really a CSV file then is it. CSV file has row 1 with column names, row 2 data starts and both are delimited by comma's. You have a CSV "like" file. Step one, infile the datafile, read whole thing and if _n_ > 2 then output to new file. Step 2, read in the new file with just the headers and data.

View solution in original post

arodriguez · Posted 09-29-2015 10:00 AM

Use this statement in the proc import
DATAROW=4;

ballardw · Posted 09-29-2015 10:06 AM

If you want to get the variable names from the file I believe that you'll need to delete or cut and paste the first two rows to another file. Currently proc import expects the variables to be on the first row.

RW9 · Posted 09-29-2015 10:22 AM

Its not really a CSV file then is it. CSV file has row 1 with column names, row 2 data starts and both are delimited by comma's. You have a CSV "like" file. Step one, infile the datafile, read whole thing and if _n_ > 2 then output to new file. Step 2, read in the new file with just the headers and data.

Tal · Posted 09-29-2015 10:38 AM

we are trying to put one process in production so i cant really play with the file (cutting,copying,pasting...) and that file will come in externaly as a CSV one. So far they been sending a xls file where data starts from 4th row but it was easy to import it using the "startrow" and "namerow" within the proc import which does not work when it comes to CSV.

So:

1. i should hope the csv file will always have the headers in the first row and data from the second onwards?

2. infile the csv file and read in from _n_>2 ?

Thank you guys

ballardw · Posted 09-29-2015 10:47 AM

proc import datafile="path/Have.csv"
            dbms=csv
            out=Want
            replace;
       datarow=2;
run;

As ballardw says, you could not get the names with proc import if doesn't start in the first row

ChrisHemedinger · Posted 09-29-2015 10:51 AM

PROC IMPORT will help you with a well-formed CSV file -- glean variable names from the first row, and "guess" the variable types based on the remaining data rows. But as others point out, this depends on having data begin in the second row.

If there are blank rows, or rows that contain information that is not data, then PROC IMPORT might not behave as well. In this case it might be better to use INFILE with a CSV file. But you'll need to define the variable attributes, informats, and exert more control over how the file is read in.

Behind the scenes, PROC IMPORT generates DATA step code to do its work, so you might consider just short-circuiting that and write your own DATA step for full control. More work, but brings you more flexibility over reading a file whose format you don't control.

Chris

SAS For Dummies 3rd Edition! Check out the new edition, covering SAS 9.4, SAS Viya, and all of the modern ways to use SAS!

ballardw · Posted 09-29-2015 11:45 AM

If you know what the structure of the file will be, as in agreed upon by the providers and users, and the data columns remain in the same order and you know the characteristics such as length, format and data type then you use a data step to read the file. The infile statement would use the firstobs=4 or whatever to point to the data row.

You can use Proc import to give you a leg up on writing the code by 1) putting an example file in a nice format such as row 1 = headers, 2) run proc import. The log will have the data step code generated to read the file. Copy that out and save as a separate program file. 3) modify the informats, formats and add label code and 4) change the Firstobs to match the actual data. I would recommend modifying the code to use a filename statement and fileref in the data step for running against later data sets.

With the data step code the only concerns become undocumented changes such as change of order of variables, changed number of header rows and changed lengths of variable contents.

RW9 · Posted 09-29-2015 11:53 AM

Well, you have some technical advice above. I would just like to add that it is all in the documentation. Ensure you have a document which describes the CSV data. This should be signed by both parties and ahered to. The programming is a very simple matter then, merely coding the information in the agreement, but without an agreement you could end up changing things each time. Simple programming advice:

Document

Agree

then

Code.

Tal · Posted 10-16-2015 12:06 PM

yea CSV files indeed come as you said RW9 so i panicked for no reasons but thank you guys for the hints

MikeZdeb · Posted 10-17-2015 11:59 AM

Hi. If you have an XLS file with the first two rows blank, column names in row 3, data starting in row 4 ... try this with the attached XLS file ...

libname x 'z:\test.xls';

data new;
set x.'sheet1$'n ;
run;

libname x clear;

data set NEW ...

Obs x y z date name

1     10    20    30    15OCT2015    HEY
2     99    99     .    01JAN1960    IT
3      1     2     3    11NOV2011    WORKS

reading a csv file starting from 4th row using proc import

Re: reading a csv file starting from 4th row using proc import

Re: reading a csv file starting from 4th row using proc import

Re: reading a csv file starting from 4th row using proc import

Re: reading a csv file starting from 4th row using proc import

Re: reading a csv file starting from 4th row using proc import

Re: reading a csv file starting from 4th row using proc import

Re: reading a csv file starting from 4th row using proc import

Re: reading a csv file starting from 4th row using proc import

Re: reading a csv file starting from 4th row using proc import

Re: reading a csv file starting from 4th row using proc import

Re: reading a csv file starting from 4th row using proc import

Registration is open

SAS Training: Just a Click Away