Hello,
I am trying to import a large CSV file into SAS. The variable names start on row 2 and data starts on row 3. How can I get SAS to start reading at row 2? I cannot open the file and delete the first row, because the file is to large for excel. I have also tried namerow=, but it doesn't work. Any help would be appreciated.
My preferred method:
Never let SAS dynamically guess variable names and parameters from the input file; instead rely on documentation of the file structure that MUST be delivered with the data.
In your case, I would
- write a data step that corresponds to the documented file structure
- use firstobs=3 in the infile statement to discard the first two lines completely
- make sure that the code exits with an error condition if the data does not conform to the expected structure
If you must use proc import, try opening in another application such as WORDPAD to delete the first row if Excel is having problems.
In fact, if the data did not originate in EXCEL and is large enough, opening in Excel and deleting a row open you up to having Excel lose rows and change the values of some of your variables: leading zeroes disappear, some values like "1-5" might get turned into dates and other undesired side effects.
I recommend looking closely at the log after using Proc import to see the code generated an ensure that variables have expected data types and formats. If you need to change something, not uncommon as Proc Import is guessing, recall the code from the log and edit and use that program to read the data.
And a kludge: read the csv file and write all of the file except the first file out to a different file. make sure the the LRECL parameter is long enough to handle your longest line. Import the newly created file.
data _null_;
infile "input filename" lrecl=4000;
file "modified filename" lrecl=4000;
input ;
if _n_>1 then put _infile_;
run;
Can you post a sample of the file and your current code?
What OS are you on?
Assuming Windows, you can extract a sample of your file to work with using PowerShell
I am using Windows 12. The file has 34 variables, and I am unsure how many rows.
proc import datafile="D:\projects\Billing History\ResBillHis2015.csv"
out=ResBillHis2015
dbms=csv
replace;
namerow=2;
datarow=3;
run;
These code could delete the blank row , but not efficient .
data _null_;
file "/folders/myfolders/new_edu.csv";
infile "/folders/myfolders/edu.csv" lrecl=32767;
input;
if missing(_infile_) then delete;
else put _infile_;
run;
proc import datafile="/folders/myfolders/new_edu.csv"
out=x
dbms=csv
replace;
run;
I don't think NAMEROW = is a valid statement.
Try adding a DATAROW statement: datarow=3;
Alternatively use a text editor that can cope with the size of the file to delete the first line, possibly Wordpad (comes with Windows). There are also a lot of free text editors available on the internet.
Very Interesting question. Two thing I can thing.
1) use
proc import datafile="/folders/myfolders/edu.csv"
out=x
dbms=csv
replace;
datarow=3;
run;
import firstly then after that rename it .
2) copy the code genereated by proc import from LOG and modify it by hand .
use
options obs=10;
when you run proc import to make it faster .
My preferred method:
Never let SAS dynamically guess variable names and parameters from the input file; instead rely on documentation of the file structure that MUST be delivered with the data.
In your case, I would
- write a data step that corresponds to the documented file structure
- use firstobs=3 in the infile statement to discard the first two lines completely
- make sure that the code exits with an error condition if the data does not conform to the expected structure
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.