I have written SAS code to import three .csv files into a SAS dataset. These csv files contain the exact same column names.
I have coded my program so the first row is read as the column name (firstobs=2). However when the program goes to the second and third csv files the first row of these are being read in as records.
Here is the code:
data work.firm_count;
format Ref_Date $7. GEO $16. 'Geographical Classification'n $2. SIZE $21. Description $110. Vector $9. Coordinate $8. Value comma9. ;
infile "c:\user\homareau\PredictiveModels\FirmCounts\Firm_count_NM_*.csv" firstobs=2 dlm=',' Missover DSD;
informat Ref_Date $7. GEO $16. 'Geographical Classification'n $2. SIZE $21. Description $110. Vector $9. Coordinate $8. Value comma9. ;
input Ref_Date GEO 'Geographical Classification'n SIZE Description Vector Coordinate Value;
label Ref_Date = 'Date'
GEO = 'Prov_Name'
'Geographical Classification'n = 'Prov_Code'
SIZE = 'Firm_Size'
Description = 'NAICS_Description'
Vector = 'CANSIM_V_Code'
Coordinate = 'Coordinate'
Value = 'Count';
drop Vector Coordinate;
run;
My question: What modificaiton do I need to make to this code so that don't end up with column names ending up in my dataset? I thought "firstobs" would have done the trick?
Thanks
Jack
Hi Jack,
Last month I saw a clever solution of this problem (using a combination of FIRSTOBS= and EOV= options) in this forum in a post written by ballardw.
Sorry, not seeing why you are mixing things up:
input Ref_Date GEO 'Geographical Classification'n SIZE Description Vector Coordinate Value;
Why do you specify 'Geographicl Classification'n ? This doesn't make sense in this context. The ''n is for select long names from a database for instance, your specifying a name for an input statement?
Also check your file, I can't see the file, so make sure the data does start on row two. This in cludes having a return at the end of each row - if they all appear in one row firstobs will not work.
data work.firm_count; infile "c:\user\homareau\PredictiveModels\FirmCounts\Firm_count_NM_*.csv" firstobs=2 dlm=',' missover dsd; format ref_date $7. geo $16. classification $2. size $21. description $110. vector $9. coordinate $8. value comma9.; informat ref_date $7. geo $16. classification $2. size $21. description $110. vector $9. coordinate $8. value comma9.; input ref_date $ geo $ classification $ size $ description $ vector $ coordinate $ value; label ref_date='Date' geo='Prov_Name' classification='Prov_Code' size='Firm_Size' description='NAICS_Description' vector='CANSIM_V_Code' coordinate='Coordinate' value='Count'; drop vector coordinate; run;
Here is a simplified version of the technique that was used in the linked article.
Then the rest of the data step is the same.
data want ;
infile "....*.csv" firstobs=2 dsd truncover eov=eov ;
input @;
if eov then input;
eov=0;
input .... ;
run;
Note also the use of the TRUNCOVER option instead of MISSOVER option. This will make writing your INPUT statement easier since you do not need to worry about reading past the end of the line. With MISSOVER if you ask SAS to read 10 characters and only 5 more exist on the line then you get a missing value, but with TRUNCOVER you get the 5 characters that do exist.
Note this will work with wildcards in the INFILE statement or when using a FILENAME statement that references multiple files.
filename indata
('file1.txt'
,'file2.txt'
,'file3.txt'
);
data want ;
infile indata eov=eov ....
Hi Tom,
I made the modification you suggested and I couldn't get it to work.
Here is my original SAS code:
data work.firm_count;
format Ref_Date $7. GEO $16. 'Geographical Classification'n $2. SIZE $21. Description $110. Vector $9. Coordinate $8. Value comma9. ;
infile "c:\user\homareau\PredictiveModels\FirmCounts\Firm_count_NM_*.csv" firstobs=2 dlm=',' missover dsd ;
informat Ref_Date $7. GEO $16. 'Geographical Classification'n $2. SIZE $21. Description $110. Vector $9. Coordinate $8. Value comma9. ;
input Ref_Date GEO 'Geographical Classification'n SIZE Description Vector Coordinate Value;
label Ref_Date = 'Date'
GEO = 'Prov_Name'
'Geographical Classification'n = 'Prov_Code'
SIZE = 'Firm_Size'
Description = 'NAICS_Description'
Vector = 'CANSIM_V_Code'
Coordinate = 'Coordinate'
Value = 'Count';
drop Vector Coordinate;
run;
How would the code above need adjusting to make it work so its not reading in headers from other sheets?
Thanks..
Jack
Doesn't look like you made any modifications?
Any way try this.
data work.firm_count;
format Ref_Date $7. GEO $16. 'Geographical Classification'n $2. SIZE $21. Description $110. Vector $9. Coordinate $8. Value comma9. ;
informat Ref_Date $7. GEO $16. 'Geographical Classification'n $2. SIZE $21. Description $110. Vector $9. Coordinate $8. Value comma9. ;
infile "c:\user\homareau\PredictiveModels\FirmCounts\Firm_count_NM_*.csv" firstobs=2 dsd truncover eov=eov ;
input @;
if eov then input;
eov=0;
input Ref_Date GEO 'Geographical Classification'n SIZE Description Vector Coordinate Value;
label Ref_Date = 'Date'
GEO = 'Prov_Name'
'Geographical Classification'n = 'Prov_Code'
SIZE = 'Firm_Size'
Description = 'NAICS_Description'
Vector = 'CANSIM_V_Code'
Coordinate = 'Coordinate'
Value = 'Count';
drop Vector Coordinate;
run;
OK Tom, I didn't enter all the code correctly. I ran it as you suggested and it works great! Thanks so much...
Jack
SAS is headed back to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team.
Interested in speaking? Content from our attendees is one of the reasons that makes SAS Innovate such a special event!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.