BookmarkSubscribeRSS Feed
Jack1
Obsidian | Level 7

I have written SAS code to import three .csv files into a SAS dataset.  These csv files contain the exact same column names.  

 

I have coded my program so the first row is read as the column name (firstobs=2).  However when the program goes to the second and third csv files the first row of these are being read in as records.  

 

Here is the code:

 

data work.firm_count;
format Ref_Date $7. GEO $16. 'Geographical Classification'n $2. SIZE $21. Description $110. Vector $9. Coordinate $8. Value comma9. ;


infile "c:\user\homareau\PredictiveModels\FirmCounts\Firm_count_NM_*.csv" firstobs=2 dlm=',' Missover DSD;


informat Ref_Date $7. GEO $16. 'Geographical Classification'n $2. SIZE $21. Description $110. Vector $9. Coordinate $8. Value comma9. ;


input Ref_Date GEO 'Geographical Classification'n SIZE Description Vector Coordinate Value;
label Ref_Date = 'Date'
GEO = 'Prov_Name'
'Geographical Classification'n = 'Prov_Code'
SIZE = 'Firm_Size'
Description = 'NAICS_Description'
Vector = 'CANSIM_V_Code'
Coordinate = 'Coordinate'
Value = 'Count';
drop Vector Coordinate;
run;

 

My question: What modificaiton do I need to make to this code so that  don't end up with column names ending up in my dataset?  I thought "firstobs" would have done the trick?

 

Thanks

Jack

6 REPLIES 6
FreelanceReinh
Jade | Level 19

Hi Jack,

 

Last month I saw a clever solution of this problem (using a combination of FIRSTOBS= and EOV= options) in this forum in a post written by ballardw.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Sorry, not seeing why you are mixing things up:

input Ref_Date GEO 'Geographical Classification'n SIZE Description Vector Coordinate Value;

Why do you specify 'Geographicl Classification'n ? This doesn't make sense in this context.  The ''n is for select long names from a database for instance, your specifying a name for an input statement?

Also check your file, I can't see the file, so make sure the data does start on row two.  This in cludes having a return at the end of each row - if they all appear in one row firstobs will not work.

data work.firm_count;
  infile  "c:\user\homareau\PredictiveModels\FirmCounts\Firm_count_NM_*.csv" firstobs=2 dlm=',' missover dsd;
  format    ref_date $7. 
            geo $16. 
            classification $2. 
            size $21. 
            description $110. 
            vector $9. 
            coordinate $8. 
            value comma9.;
  informat  ref_date $7. 
            geo $16. 
            classification $2. 
            size $21. 
            description $110. 
            vector $9. 
            coordinate $8. 
            value comma9.;
  input     ref_date $
            geo $
            classification $
            size $
            description $
            vector $
            coordinate $
            value;
  label     ref_date='Date'
            geo='Prov_Name'
            classification='Prov_Code'
            size='Firm_Size'
            description='NAICS_Description'
            vector='CANSIM_V_Code'
            coordinate='Coordinate'
            value='Count';
  drop vector coordinate;
run;
 
Tom
Super User Tom
Super User

Here is a simplified version of the technique that was used in the linked article.

 

  • Use the EOV option to name a flag variable that will let you detect when a new file starts.
  • Use the FIRSTOBS=2 option to skip the first record since the EOV flag will not be set until you start a NEW file.  
  • Read the line and hold it with the trailing @ to allow SAS to be able to set the EOV flag.
  • When it is the beginning of a new file then use another INPUT statement to throw away the line.
  • Reset the EOV variable to false because SAS will not reset it for you.

 

Then the rest of the data step is the same.

 

 

data want ;
  infile "....*.csv" firstobs=2 dsd truncover eov=eov ;
  input @;
  if eov then input;
  eov=0;
  input .... ;
run;

 

Note also the use of the TRUNCOVER option instead of MISSOVER option.  This will make writing your INPUT statement easier since you do not need to worry about reading past the end of the line. With MISSOVER if you ask SAS to read 10 characters and only 5 more exist on the line then you get a missing value, but with TRUNCOVER you get the 5 characters that do exist.

 

Note this will work with wildcards in the INFILE statement or when using a FILENAME statement that references multiple files.

 

filename indata
('file1.txt'
,'file2.txt'
,'file3.txt'
);

data want ;
   infile indata eov=eov ....

 

Jack1
Obsidian | Level 7

Hi Tom,

 

I made the modification you suggested and I couldn't get it to work.

 

Here is my original SAS code:

 

data work.firm_count;
format Ref_Date $7. GEO $16. 'Geographical Classification'n $2. SIZE $21. Description $110. Vector $9. Coordinate $8. Value comma9. ;
infile "c:\user\homareau\PredictiveModels\FirmCounts\Firm_count_NM_*.csv" firstobs=2 dlm=',' missover dsd ;
informat Ref_Date $7. GEO $16. 'Geographical Classification'n $2. SIZE $21. Description $110. Vector $9. Coordinate $8. Value comma9. ;
input Ref_Date GEO 'Geographical Classification'n SIZE Description Vector Coordinate Value;
label Ref_Date = 'Date'
      GEO = 'Prov_Name'
      'Geographical Classification'n = 'Prov_Code'
      SIZE = 'Firm_Size'
      Description = 'NAICS_Description'
      Vector = 'CANSIM_V_Code'
      Coordinate = 'Coordinate'
      Value = 'Count';
drop Vector Coordinate;
run;

How would the code above need adjusting to make it work so its not reading in headers from other sheets?  

 

Thanks..

Jack

 

 

Tom
Super User Tom
Super User

Doesn't look like you made any modifications?

Any way try this.

data work.firm_count;
format Ref_Date $7. GEO $16. 'Geographical Classification'n $2. SIZE $21. Description $110. Vector $9. Coordinate $8. Value comma9. ;
informat Ref_Date $7. GEO $16. 'Geographical Classification'n $2. SIZE $21. Description $110. Vector $9. Coordinate $8. Value comma9. ;

infile "c:\user\homareau\PredictiveModels\FirmCounts\Firm_count_NM_*.csv" firstobs=2 dsd truncover eov=eov  ;

input @;
if eov then input;
eov=0;

input Ref_Date GEO 'Geographical Classification'n SIZE Description Vector Coordinate Value;
label Ref_Date = 'Date'
      GEO = 'Prov_Name'
      'Geographical Classification'n = 'Prov_Code'
      SIZE = 'Firm_Size'
      Description = 'NAICS_Description'
      Vector = 'CANSIM_V_Code'
      Coordinate = 'Coordinate'
      Value = 'Count';
drop Vector Coordinate;
run; 

Jack1
Obsidian | Level 7

OK Tom, I didn't enter all the code correctly.  I ran it as you suggested and it works great!  Thanks so much...

 

Jack

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1784 views
  • 2 likes
  • 4 in conversation