Help using Base SAS procedures

Reading in multiple csv files...column names appearing in SAS dataset

Reply
Contributor
Posts: 54

Reading in multiple csv files...column names appearing in SAS dataset

I have written SAS code to import three .csv files into a SAS dataset.  These csv files contain the exact same column names.  

 

I have coded my program so the first row is read as the column name (firstobs=2).  However when the program goes to the second and third csv files the first row of these are being read in as records.  

 

Here is the code:

 

data work.firm_count;
format Ref_Date $7. GEO $16. 'Geographical Classification'n $2. SIZE $21. Description $110. Vector $9. Coordinate $8. Value comma9. ;


infile "c:\user\homareau\PredictiveModels\FirmCounts\Firm_count_NM_*.csv" firstobs=2 dlm=',' Missover DSD;


informat Ref_Date $7. GEO $16. 'Geographical Classification'n $2. SIZE $21. Description $110. Vector $9. Coordinate $8. Value comma9. ;


input Ref_Date GEO 'Geographical Classification'n SIZE Description Vector Coordinate Value;
label Ref_Date = 'Date'
GEO = 'Prov_Name'
'Geographical Classification'n = 'Prov_Code'
SIZE = 'Firm_Size'
Description = 'NAICS_Description'
Vector = 'CANSIM_V_Code'
Coordinate = 'Coordinate'
Value = 'Count';
drop Vector Coordinate;
run;

 

My question: What modificaiton do I need to make to this code so that  don't end up with column names ending up in my dataset?  I thought "firstobs" would have done the trick?

 

Thanks

Jack

Trusted Advisor
Posts: 1,117

Re: Reading in multiple csv files...column names appearing in SAS dataset

Hi Jack,

 

Last month I saw a clever solution of this problem (using a combination of FIRSTOBS= and EOV= options) in this forum in a post written by ballardw.

Super User
Super User
Posts: 7,970

Re: Reading in multiple csv files...column names appearing in SAS dataset

Sorry, not seeing why you are mixing things up:

input Ref_Date GEO 'Geographical Classification'n SIZE Description Vector Coordinate Value;

Why do you specify 'Geographicl Classification'n ? This doesn't make sense in this context.  The ''n is for select long names from a database for instance, your specifying a name for an input statement?

Also check your file, I can't see the file, so make sure the data does start on row two.  This in cludes having a return at the end of each row - if they all appear in one row firstobs will not work.

data work.firm_count;
  infile  "c:\user\homareau\PredictiveModels\FirmCounts\Firm_count_NM_*.csv" firstobs=2 dlm=',' missover dsd;
  format    ref_date $7. 
            geo $16. 
            classification $2. 
            size $21. 
            description $110. 
            vector $9. 
            coordinate $8. 
            value comma9.;
  informat  ref_date $7. 
            geo $16. 
            classification $2. 
            size $21. 
            description $110. 
            vector $9. 
            coordinate $8. 
            value comma9.;
  input     ref_date $
            geo $
            classification $
            size $
            description $
            vector $
            coordinate $
            value;
  label     ref_date='Date'
            geo='Prov_Name'
            classification='Prov_Code'
            size='Firm_Size'
            description='NAICS_Description'
            vector='CANSIM_V_Code'
            coordinate='Coordinate'
            value='Count';
  drop vector coordinate;
run;
 
Super User
Super User
Posts: 7,054

Re: Reading in multiple csv files...column names appearing in SAS dataset

[ Edited ]

Here is a simplified version of the technique that was used in the linked article.

 

  • Use the EOV option to name a flag variable that will let you detect when a new file starts.
  • Use the FIRSTOBS=2 option to skip the first record since the EOV flag will not be set until you start a NEW file.  
  • Read the line and hold it with the trailing @ to allow SAS to be able to set the EOV flag.
  • When it is the beginning of a new file then use another INPUT statement to throw away the line.
  • Reset the EOV variable to false because SAS will not reset it for you.

 

Then the rest of the data step is the same.

 

 

data want ;
  infile "....*.csv" firstobs=2 dsd truncover eov=eov ;
  input @;
  if eov then input;
  eov=0;
  input .... ;
run;

 

Note also the use of the TRUNCOVER option instead of MISSOVER option.  This will make writing your INPUT statement easier since you do not need to worry about reading past the end of the line. With MISSOVER if you ask SAS to read 10 characters and only 5 more exist on the line then you get a missing value, but with TRUNCOVER you get the 5 characters that do exist.

 

Note this will work with wildcards in the INFILE statement or when using a FILENAME statement that references multiple files.

 

filename indata
('file1.txt'
,'file2.txt'
,'file3.txt'
);

data want ;
   infile indata eov=eov ....

 

Contributor
Posts: 54

Re: Reading in multiple csv files...column names appearing in SAS dataset

Hi Tom,

 

I made the modification you suggested and I couldn't get it to work.

 

Here is my original SAS code:

 

data work.firm_count;
format Ref_Date $7. GEO $16. 'Geographical Classification'n $2. SIZE $21. Description $110. Vector $9. Coordinate $8. Value comma9. ;
infile "c:\user\homareau\PredictiveModels\FirmCounts\Firm_count_NM_*.csv" firstobs=2 dlm=',' missover dsd ;
informat Ref_Date $7. GEO $16. 'Geographical Classification'n $2. SIZE $21. Description $110. Vector $9. Coordinate $8. Value comma9. ;
input Ref_Date GEO 'Geographical Classification'n SIZE Description Vector Coordinate Value;
label Ref_Date = 'Date'
      GEO = 'Prov_Name'
      'Geographical Classification'n = 'Prov_Code'
      SIZE = 'Firm_Size'
      Description = 'NAICS_Description'
      Vector = 'CANSIM_V_Code'
      Coordinate = 'Coordinate'
      Value = 'Count';
drop Vector Coordinate;
run;

How would the code above need adjusting to make it work so its not reading in headers from other sheets?  

 

Thanks..

Jack

 

 

Super User
Super User
Posts: 7,054

Re: Reading in multiple csv files...column names appearing in SAS dataset

Doesn't look like you made any modifications?

Any way try this.

data work.firm_count;
format Ref_Date $7. GEO $16. 'Geographical Classification'n $2. SIZE $21. Description $110. Vector $9. Coordinate $8. Value comma9. ;
informat Ref_Date $7. GEO $16. 'Geographical Classification'n $2. SIZE $21. Description $110. Vector $9. Coordinate $8. Value comma9. ;

infile "c:\user\homareau\PredictiveModels\FirmCounts\Firm_count_NM_*.csv" firstobs=2 dsd truncover eov=eov  ;

input @;
if eov then input;
eov=0;

input Ref_Date GEO 'Geographical Classification'n SIZE Description Vector Coordinate Value;
label Ref_Date = 'Date'
      GEO = 'Prov_Name'
      'Geographical Classification'n = 'Prov_Code'
      SIZE = 'Firm_Size'
      Description = 'NAICS_Description'
      Vector = 'CANSIM_V_Code'
      Coordinate = 'Coordinate'
      Value = 'Count';
drop Vector Coordinate;
run; 

Contributor
Posts: 54

Re: Reading in multiple csv files...column names appearing in SAS dataset

OK Tom, I didn't enter all the code correctly.  I ran it as you suggested and it works great!  Thanks so much...

 

Jack

Ask a Question
Discussion stats
  • 6 replies
  • 304 views
  • 2 likes
  • 4 in conversation