DATA Step, Macro, Functions and more

Proc Import to Read CSV File Variable Names Start on Row 2

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 10
Accepted Solution

Proc Import to Read CSV File Variable Names Start on Row 2

Hello,

 

I am trying to import a large CSV file into SAS. The variable names start on row 2 and data starts on row 3. How can I get SAS to start reading at row 2? I cannot open the file and delete the first row, because the file is to large for excel. I have also tried namerow=, but it doesn't work. Any help would be appreciated.


Accepted Solutions
Solution
‎02-01-2016 10:13 AM
Super User
Posts: 7,827

Re: Proc Import to Read CSV File Variable Names Start on Row 2

My preferred method:

Never let SAS dynamically guess variable names and parameters from the input file; instead rely on documentation of the file structure that MUST be delivered with the data.

In your case, I would

- write a data step that corresponds to the documented file structure

- use firstobs=3 in the infile statement to discard the first two lines completely

- make sure that the code exits with an error condition if the data does not conform to the expected structure

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers

View solution in original post


All Replies
Super User
Posts: 11,343

Re: Proc Import to Read CSV File Variable Names Start on Row 2

[ Edited ]

If you must use proc import, try opening in another application such as WORDPAD to delete the first row if Excel is having problems.

 

In fact, if the data did not originate in EXCEL and is large enough, opening in Excel and deleting a row open you up to having Excel lose rows and change the values of some of your variables: leading zeroes disappear, some values like "1-5" might get turned into dates and other undesired side effects.

 

I recommend looking closely at the log after using Proc import to see the code generated an ensure that variables have expected data types and formats. If you need to change something, not uncommon as Proc Import is guessing, recall the code from the log and edit and use that program to read the data.

 

And a kludge: read the csv file and write all of the file except the first file out to a different file. make sure the the LRECL parameter is long enough to handle your longest line. Import the newly created file.

 

data _null_;
   infile "input filename" lrecl=4000;
   file "modified filename" lrecl=4000;
   input ;
   if _n_>1 then put _infile_;
run;

 

 

Super User
Posts: 19,822

Re: Proc Import to Read CSV File Variable Names Start on Row 2

Can you post a sample of the file and your current code?

What OS are you on?

 

Assuming Windows, you can extract a sample of your file to work with using PowerShell 

https://jamesmccaffrey.wordpress.com/2011/08/25/copying-the-first-n-lines-of-a-text-file-using-power...

 

 

Occasional Contributor
Posts: 10

Re: Proc Import to Read CSV File Variable Names Start on Row 2

I am using Windows 12. The file has 34 variables, and I am unsure how many rows.

 

proc import datafile="D:\projects\Billing History\ResBillHis2015.csv"

out=ResBillHis2015

dbms=csv

replace;

namerow=2;

datarow=3;

 

run;

Super User
Posts: 10,041

Re: Proc Import to Read CSV File Variable Names Start on Row 2

These code could delete the blank row , but not efficient .

 


data _null_;
file "/folders/myfolders/new_edu.csv";
infile "/folders/myfolders/edu.csv" lrecl=32767;
input;
if missing(_infile_) then delete;
 else put _infile_;
run;

proc import datafile="/folders/myfolders/new_edu.csv"
out=x
dbms=csv
replace;
run;
Super User
Posts: 3,256

Re: Proc Import to Read CSV File Variable Names Start on Row 2

[ Edited ]

I don't think NAMEROW =  is a valid statement.

Try adding a DATAROW statement: datarow=3;

 

Alternatively use a text editor that can cope with the size of the file to delete the first line, possibly Wordpad (comes with Windows). There are also a lot of free text editors available on the internet.

Super User
Posts: 10,041

Re: Proc Import to Read CSV File Variable Names Start on Row 2

Very Interesting question. Two thing I can thing.

1) use

proc import datafile="/folders/myfolders/edu.csv"
out=x
dbms=csv
replace;
datarow=3;
run;

 

import firstly then after that rename it .

 

2) copy the code genereated by proc import  from LOG and modify it by hand .

use 

options obs=10;

when you run proc import to make it faster .

Solution
‎02-01-2016 10:13 AM
Super User
Posts: 7,827

Re: Proc Import to Read CSV File Variable Names Start on Row 2

My preferred method:

Never let SAS dynamically guess variable names and parameters from the input file; instead rely on documentation of the file structure that MUST be delivered with the data.

In your case, I would

- write a data step that corresponds to the documented file structure

- use firstobs=3 in the infile statement to discard the first two lines completely

- make sure that the code exits with an error condition if the data does not conform to the expected structure

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 2221 views
  • 0 likes
  • 6 in conversation