i have a csv file and data starts from 4th road (3rd road is variable names).
Is there a way i can import it using proc import?
Its not really a CSV file then is it. CSV file has row 1 with column names, row 2 data starts and both are delimited by comma's. You have a CSV "like" file. Step one, infile the datafile, read whole thing and if _n_ > 2 then output to new file. Step 2, read in the new file with just the headers and data.
If you want to get the variable names from the file I believe that you'll need to delete or cut and paste the first two rows to another file. Currently proc import expects the variables to be on the first row.
Its not really a CSV file then is it. CSV file has row 1 with column names, row 2 data starts and both are delimited by comma's. You have a CSV "like" file. Step one, infile the datafile, read whole thing and if _n_ > 2 then output to new file. Step 2, read in the new file with just the headers and data.
we are trying to put one process in production so i cant really play with the file (cutting,copying,pasting...) and that file will come in externaly as a CSV one. So far they been sending a xls file where data starts from 4th row but it was easy to import it using the "startrow" and "namerow" within the proc import which does not work when it comes to CSV.
So:
1. i should hope the csv file will always have the headers in the first row and data from the second onwards?
2. infile the csv file and read in from _n_>2 ?
Thank you guys
proc import datafile="path/Have.csv"
dbms=csv
out=Want
replace;
datarow=2;
run;
As ballardw says, you could not get the names with proc import if doesn't start in the first row
PROC IMPORT will help you with a well-formed CSV file -- glean variable names from the first row, and "guess" the variable types based on the remaining data rows. But as others point out, this depends on having data begin in the second row.
If there are blank rows, or rows that contain information that is not data, then PROC IMPORT might not behave as well. In this case it might be better to use INFILE with a CSV file. But you'll need to define the variable attributes, informats, and exert more control over how the file is read in.
Behind the scenes, PROC IMPORT generates DATA step code to do its work, so you might consider just short-circuiting that and write your own DATA step for full control. More work, but brings you more flexibility over reading a file whose format you don't control.
Chris
If you know what the structure of the file will be, as in agreed upon by the providers and users, and the data columns remain in the same order and you know the characteristics such as length, format and data type then you use a data step to read the file. The infile statement would use the firstobs=4 or whatever to point to the data row.
You can use Proc import to give you a leg up on writing the code by 1) putting an example file in a nice format such as row 1 = headers, 2) run proc import. The log will have the data step code generated to read the file. Copy that out and save as a separate program file. 3) modify the informats, formats and add label code and 4) change the Firstobs to match the actual data. I would recommend modifying the code to use a filename statement and fileref in the data step for running against later data sets.
With the data step code the only concerns become undocumented changes such as change of order of variables, changed number of header rows and changed lengths of variable contents.
Well, you have some technical advice above. I would just like to add that it is all in the documentation. Ensure you have a document which describes the CSV data. This should be signed by both parties and ahered to. The programming is a very simple matter then, merely coding the information in the agreement, but without an agreement you could end up changing things each time. Simple programming advice:
Document
Agree
then
Code.
yea CSV files indeed come as you said RW9 so i panicked for no reasons but thank you guys for the hints
Hi. If you have an XLS file with the first two rows blank, column names in row 3, data starting in row 4 ... try this with the attached XLS file ...
libname x 'z:\test.xls';
data new;
set x.'sheet1$'n ;
run;
libname x clear;
data set NEW ...
Obs x y z date name
1 10 20 30 15OCT2015 HEY
2 99 99 . 01JAN1960 IT
3 1 2 3 11NOV2011 WORKS
Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.