BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
rmacarthur
Pyrite | Level 9

Hi SAS Friends, 

Need to import and analyze hundreds of .csv files that contain environmental data.

a sample has been uploaded here 

Because there are so many, individually saving them as a .xlsx file for import seems impractical, which is how many SAS programmers handle this.

The first 13 lines of data are not needed. 

line 14 has the column header information, and then each line thereafter has data in rows, and the data structure is straight forward.

I've attached a sample file. 

Question is , how can this be imported as a CSV file, after which the first 13 lines are deleted, and keeping headers and data structure that follow.   

Here's a sample of what it looks like:

 

Key Name:Suffix Trend Definitions Used
Point_1: HOSP_1FL_PHARM_FCU01:ROOM TEMP 15 minutes
Point_2: HOS.1FL.ANTE.RM.DIFF.PRESS 15 minutes
Point_3: HOS.1FL.ANTE.RM.HUMID 15 minutes
Point_4: HOS.1FL.ANTE.RM.TEMP 15 minutes
Point_5: HOS.1FL.CLEAN.RM.DIFF.PRESS 15 minutes
Point_6: HOS.1FL.STORAGE.DIFF.PRESS 15 minutes
Point_7: pCOWeb7700_A003 Error: Point not found in database.
Point_8: pCOWeb7700_A004 Error: Point not found in database.
Time Interval: 15 Minutes
Date Range: 12/31/2019 06:55:11 - 1/1/2020 06:55:11
Report Timings: All Hours

<>Date Time Point_1 Point_2 Point_3 Point_4 Point_5 Point_6 Point_7 Point_8
12/31/2019 6:55:11 72 0.04 26.81 73.53 -0.09 -0.06 No Data No Data
12/31/2019 7:10:11 72.25 0.04 26.81 73.57 -0.09 -0.06 No Data No Data
12/31/2019 7:25:11 72.25 0.05 26.81 73.62 -0.09 -0.06 No Data No Data

////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////

Any advice is greatly appreciated, 

Thanks !

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

That files does not look very complicated.

%let path=C:\downloads;
%let fname=Temp_ex.csv ;

proc format ;
  invalue  nodata
    'TIME (-3)'=.t
    'No Data'=.d 
     other=_same_
  ; 
run;

data want;
  infile "&path/&fname" dsd truncover firstobs=15;
  input @;
  if index(_infile_,'End of Report') then delete;
  input date :mmddyy. time :time. (point1-point8) (:nodata.);
  format date yymmdd10. time time8. ;
run;

proc print;
run;

Results:

Obs         date       time   point1   point2   point3   point4   point5   point6   point7   point8

  1   2019-12-31    6:55:11    72.00    0.04     26.81    73.53    -0.09    -0.06      D        D
  2   2019-12-31    7:10:11    72.25    0.04     26.81    73.57    -0.09    -0.06      D        D
  3   2019-12-31    7:25:11    72.25    0.05     26.81    73.62    -0.09    -0.06      D        D
  4   2019-12-31    7:40:11    72.25    0.04     26.35    73.67    -0.09    -0.06      D        D
  5   2019-12-31    7:55:11    72.25    0.05     26.20    73.75    -0.09    -0.06      D        D
  6   2019-12-31    8:10:11    72.25    0.04     26.10    73.80    -0.09    -0.06      D        D
  7   2019-12-31    8:25:11    72.25    0.04     26.15    73.84    -0.09    -0.06      D        D
  8   2019-12-31    8:40:11    72.25    0.04     26.15    73.93    -0.09    -0.06      D        D
  9   2019-12-31    8:55:11    72.25    0.05     26.05    73.98    -0.09    -0.06      D        D
 10   2019-12-31    9:10:11    72.25    0.04     25.90    74.02    -0.09    -0.06      D        D
 11   2019-12-31    9:25:11    72.25    0.04     26.30    74.02    -0.09    -0.06      D        D
 12   2019-12-31    9:40:11    72.25    0.05     26.15    74.07    -0.09    -0.06      D        D
 13   2019-12-31    9:55:11    72.25    0.06     26.35    74.03    -0.09    -0.06      D        D
 14   2019-12-31   10:10:11    72.25    0.06     26.30    74.07    -0.09    -0.06      D        D
 15   2019-12-31   10:25:11    72.50    0.06     26.25    74.07    -0.09    -0.06      D        D
 16   2019-12-31   10:40:11    72.50    0.05     26.35    74.07    -0.09    -0.06      D        D
 17   2019-12-31   10:55:11    72.50    0.05     26.15    74.12    -0.09    -0.06      D        D
 18   2019-12-31   11:10:11    72.50    0.05     26.10    74.21    -0.10    -0.06      D        D
 19   2019-12-31   11:25:11    72.50    0.06     26.00    74.25    -0.10    -0.06      D        D
 20   2019-12-31   11:40:11    72.50    0.06     26.00    74.30    -0.10    -0.06      D        D
 21   2019-12-31   11:55:11    72.50    0.04     26.05    74.43    -0.09    -0.06      D        D
 22   2019-12-31   12:10:11    72.50    0.06     25.84    74.47    -0.10    -0.06      D        D
 23   2019-12-31   12:25:11    72.50    0.05     25.95    74.57    -0.10    -0.06      D        D
 24   2019-12-31   12:40:11    72.50    0.06     26.00    74.59    -0.10    -0.06      D        D
 25   2019-12-31   12:55:11    72.75    0.06     25.90    74.61    -0.10    -0.06      D        D
 26   2019-12-31   13:10:11    72.50    0.05     26.20    74.66    -0.09    -0.06      D        D
 27   2019-12-31   13:25:11    72.50    0.06     26.20    74.66    -0.10    -0.06      D        D
 28   2019-12-31   13:40:11    72.50    0.06     26.30    74.66    -0.10    -0.06      D        D
 29   2019-12-31   13:55:11    72.50    0.04     26.40    74.66    -0.09    -0.06      D        D
 30   2019-12-31   14:10:11    72.50    0.05     26.41    74.70    -0.09    -0.06      D        D
 31   2019-12-31   14:25:11    72.50    0.06     26.05    74.66    -0.10    -0.06      D        D
 32   2019-12-31   14:40:11    72.50    0.05     26.00    74.66    -0.09    -0.06      D        D
 33   2019-12-31   14:55:11    72.50    0.06     25.85    74.66    -0.10    -0.06      D        D
 34   2019-12-31   15:10:11    72.50    0.06     25.19    74.70    -0.10    -0.06      D        D
 35   2019-12-31   15:25:11    72.50    0.05     26.45    74.66    -0.09    -0.06      D        D
 36   2019-12-31   15:40:11    72.50    0.06     27.37    74.66    -0.10    -0.06      D        D
 37   2019-12-31   15:55:11    72.50    0.06     27.21    74.70    -0.09    -0.06      D        D
 38   2019-12-31   16:10:11    72.50    0.04     26.20    74.70    -0.09    -0.06      D        D
 39   2019-12-31   16:25:11    72.50    0.04     25.76    74.71    -0.09    -0.06      D        D
 40   2019-12-31   16:40:11    72.50    0.05     25.44    74.70    -0.09    -0.06      D        D
 41   2019-12-31   16:55:11    72.50    0.06     25.40    74.71    -0.10    -0.06      D        D
 42   2019-12-31   17:10:11    72.50    0.06     25.20    74.66    -0.10    -0.06      D        D
 43   2019-12-31   17:25:11    72.50    0.04     25.04    74.66    -0.09    -0.06      D        D
 44   2019-12-31   17:40:11    72.50    0.05     25.30    74.66    -0.09    -0.06      D        D
 45   2019-12-31   17:55:11    72.50    0.06     25.29    74.70    -0.10    -0.06      D        D
 46   2019-12-31   18:10:11    72.50    0.05     25.39    74.75    -0.10    -0.06      D        D
 47   2019-12-31   18:25:11    72.50    0.06     25.59    74.70    -0.10    -0.06      D        D
 48   2019-12-31   18:40:11    72.50    0.06     25.59    74.75    -0.10    -0.06      D        D
 49   2019-12-31   18:55:11    72.50    0.05     26.00    74.75    -0.09    -0.06      D        D
 50   2019-12-31   19:10:11    72.25    0.06     25.65    74.61    -0.10    -0.06      D        D
 51   2019-12-31   19:25:11    72.00    0.05     27.06    74.39    -0.10    -0.06      D        D
 52   2019-12-31   19:40:11    71.75    0.06     27.97    74.21    -0.10    -0.06      D        D
 53   2019-12-31   19:55:11    71.50    0.05     26.05    74.07    -0.10    -0.06      D        D
 54   2019-12-31   20:10:11    71.50    0.05     26.21    73.93    -0.09    -0.06      D        D
 55   2019-12-31   20:25:11    71.25    0.06     26.41    73.76    -0.10    -0.06      D        D
 56   2019-12-31   20:40:11    71.25    0.05     26.41    73.71    -0.09    -0.06      D        D
 57   2019-12-31   20:55:11    71.75    0.06     26.00    73.71    -0.10    -0.06      D        D
 58   2019-12-31   21:10:11    72.00    0.06     27.06    73.80    -0.09    -0.06      D        D
 59   2019-12-31   21:25:11    72.25    0.05     27.62    72.99    -0.09    -0.06      D        D
 60   2019-12-31   21:40:11    72.50    0.06     27.72    72.12    -0.10    -0.06      D        D
 61   2019-12-31   21:55:11    72.50    0.06     25.90    72.35    -0.10    -0.06      D        D
 62   2019-12-31   22:10:11    72.50    0.05     25.24    72.84    -0.09    -0.06      D        D
 63   2019-12-31   22:25:11    72.75    0.05     24.79    73.25    -0.09    -0.06      D        D
 64   2019-12-31   22:40:11    72.50    0.04     24.43    73.06    -0.08    -0.06      D        D
 65   2019-12-31   22:55:11    72.00    0.05     23.43    72.52    -0.09    -0.06      D        D
 66   2019-12-31   23:10:11    71.50    0.05     22.82    72.75    -0.09    -0.06      D        D
 67   2019-12-31   23:25:11    71.50    0.04     22.32    73.02    -0.09    -0.06      D        D
 68   2019-12-31   23:40:11    71.25    0.04     21.66    73.20    -0.09    -0.06      D        D
 69   2019-12-31   23:55:11    71.50    0.05     21.30    73.38    -0.09    -0.06      D        D
 70   2020-01-01    0:10:11    71.75    0.04     20.75    73.57    -0.09    -0.06      D        D
 71   2020-01-01    0:25:11    72.25    0.04     20.15    73.75    -0.09    -0.06      D        D
 72   2020-01-01    0:40:11    72.25    0.04     19.60    73.89    -0.09    -0.06      D        D
 73   2020-01-01    0:55:11    72.50    0.04     19.44    73.98    -0.09    -0.06      D        D
 74   2020-01-01    1:10:11    72.50    0.04     19.19    74.07    -0.09    -0.06      D        D
 75   2020-01-01    1:25:11    72.50    0.05     19.35    74.12    -0.09    -0.06      D        D
 76   2020-01-01    1:40:11    72.50    0.04     19.79    74.16    -0.09    -0.06      D        D
 77   2020-01-01    1:55:11    72.50    0.04     19.90    74.16    -0.08    -0.06      D        D
 78   2020-01-01    2:10:11    72.50    0.05     20.45    73.29    -0.09    -0.06      D        D
 79   2020-01-01    2:25:11    72.50    0.04     20.36    73.35    -0.09    -0.06      D        D
 80   2020-01-01    2:40:11    72.50    0.04     19.90    73.57    -0.09    -0.06      D        D
 81   2020-01-01    2:55:11    72.75    0.04     19.49    73.80    -0.09    -0.06      D        D
 82   2020-01-01    3:10:11      T       T         T        T        T        T        D        D
 83   2020-01-01    3:25:11    72.50    0.04     19.34    74.16    -0.09    -0.06      D        D
 84   2020-01-01    3:40:11    72.50    0.05     18.74    74.29    -0.09    -0.06      D        D
 85   2020-01-01    3:55:11    72.50    0.04     18.44    74.31    -0.09    -0.06      D        D
 86   2020-01-01    4:10:11    72.50    0.04     18.39    74.35    -0.09    -0.06      D        D
 87   2020-01-01    4:25:11    72.50    0.05     18.43    74.34    -0.09    -0.06      D        D
 88   2020-01-01    4:40:11    72.50    0.05     18.34    74.39    -0.09    -0.06      D        D
 89   2020-01-01    4:55:11    72.50    0.04     18.33    74.34    -0.09    -0.06      D        D
 90   2020-01-01    5:10:11    72.50    0.04     18.13    74.35    -0.09    -0.06      D        D
 91   2020-01-01    5:25:11    72.50    0.05     17.83    74.39    -0.09    -0.06      D        D
 92   2020-01-01    5:40:11    72.50    0.04     17.53    74.30    -0.09    -0.06      D        D
 93   2020-01-01    5:55:11    72.50    0.05     17.43    74.30    -0.09    -0.06      D        D
 94   2020-01-01    6:10:11    72.50    0.04     17.63    74.25    -0.09    -0.06      D        D
 95   2020-01-01    6:25:11    72.25    0.04     18.38    74.25    -0.09    -0.06      D        D

View solution in original post

16 REPLIES 16
RichardDeVen
Barite | Level 11

Are the columns always DATE, TIME, POINT_1 - POINT_8 ?

 

If so, the data from all the hundreds of data files can be placed in one data set.

If not, what is your strategy for naming the data set for a given data file ?

Are you sure you want to discard the auxiliary information about each POINT_<n> column ?

rmacarthur
Pyrite | Level 9

Hi Richard,

Thanks for the comments !

 

Are the columns always DATE, TIME, POINT_1 - POINT_8 ?

yes

 

If not, what is your strategy for naming the data set for a given data file ?

was planning to keep each the original filename, and just remove the troublesome characters (see below).  However that may not be the best approach

 

Are you sure you want to discard the auxiliary information about each POINT_<n> column ?

Yes, those lines are all the same file to file and just define each variable.

 

If so, the data from all the hundreds of data files can be placed in one data set.

Yes, that's the goal.  A macro or do loop is needed for that, right? That can read the consecutive CSV file names from the directory, and import them.  I don't know how to do that, too.

Because the file names have "._-" in them, I've modified the code from Tom to get rid of those, which I've pasted below.  So how can I take that further to import each uniquely named data set, one at a time, to create one large file?

 

/*************************************/

%let fname=Temp_01-01-20_06-55.Hum.Press-DA.csv ;  /*THis is what the actual file names look like*/

data e ;
fn2 = symget("fname") ; 
fn1 = compress(fn2, "-._ ") ;
call symput('fn', fn1) ;
run ;


proc format ;
invalue nodata
'TIME (-3)'=.t
'No Data'=.d 
other=_same_
; 
run;

data &fn;
infile "&RenoD/&fname" dsd truncover firstobs=15;
input @;
if index(_infile_,'End of Report') then delete;
input date :mmddyy. time :time. (point1-point8) (:nodata.);
format date yymmdd10. time time8. ;
run;

proc print;
run;
/*************************************/

 

Tom
Super User Tom
Super User

If they are all in one directory should be able to just use a wildcard in INFILE statement.  

 

%let renod=your path here;

data want;
  length filename $256 fname $50 ;
  infile "&RenoD/*.csv" dsd truncover filevar=filename;
  input @;
  fname = scan(filename,-1,'/\');
  if lag(filename) ne filename then row=0;
  row+1;
  if row <15 or index(_infile_,'End of Report') then delete;
  input date :mmddyy. time :time. (point1-point8) (:nodata.);
  format date yymmdd10. time time8. ;
run;

If you they are scattered or mixed then make a dataset with the list of files instead. Then use that list to drive reading the files. So if you have dataset name FILELIST with a variable named FILENAME you could use this data step to read each file starting at line 15 and skipping the last row.

data want;
  set filelist ;
  filename2 = filename;
  infile csv dsd truncover filename=filename2 end=eof firstobs=15;
  do while (not eof);
    input @;
    if not eof then do;
      input date :mmddyy. time :time. (point1-point8) (:nodata.);
      output;
    end;
  end;
  format date yymmdd10. time time8. ;
run;

 

 

rmacarthur
Pyrite | Level 9

Hi Tom,

 

Thank you very much.  The files are all in one place, so I ran the first code and got the following ,

 

" A Physical file reference (i.e. "PHYSICAL FILE REFERENCE" ) or an aggregate file storage reference (i.e. AGGREGATE(MEMBER) ) reference cannot be used with the FILEVAR= option."

 

However changing "FILEVAR" to "FILENAME" allowed the code to work like a charm.  That modification was recommended in some other SAS community postings.


The output data look great, much appreciated.

 

Robert

Reeza
Super User

Are the files complex because of the header rows?

Do all files have the same variables and layout beyond that? 

Is it always 13 lines in the header?

rmacarthur
Pyrite | Level 9
Yes, the files are all consistent , they're machine generated, thanks !
Tom
Super User Tom
Super User

That files does not look very complicated.

%let path=C:\downloads;
%let fname=Temp_ex.csv ;

proc format ;
  invalue  nodata
    'TIME (-3)'=.t
    'No Data'=.d 
     other=_same_
  ; 
run;

data want;
  infile "&path/&fname" dsd truncover firstobs=15;
  input @;
  if index(_infile_,'End of Report') then delete;
  input date :mmddyy. time :time. (point1-point8) (:nodata.);
  format date yymmdd10. time time8. ;
run;

proc print;
run;

Results:

Obs         date       time   point1   point2   point3   point4   point5   point6   point7   point8

  1   2019-12-31    6:55:11    72.00    0.04     26.81    73.53    -0.09    -0.06      D        D
  2   2019-12-31    7:10:11    72.25    0.04     26.81    73.57    -0.09    -0.06      D        D
  3   2019-12-31    7:25:11    72.25    0.05     26.81    73.62    -0.09    -0.06      D        D
  4   2019-12-31    7:40:11    72.25    0.04     26.35    73.67    -0.09    -0.06      D        D
  5   2019-12-31    7:55:11    72.25    0.05     26.20    73.75    -0.09    -0.06      D        D
  6   2019-12-31    8:10:11    72.25    0.04     26.10    73.80    -0.09    -0.06      D        D
  7   2019-12-31    8:25:11    72.25    0.04     26.15    73.84    -0.09    -0.06      D        D
  8   2019-12-31    8:40:11    72.25    0.04     26.15    73.93    -0.09    -0.06      D        D
  9   2019-12-31    8:55:11    72.25    0.05     26.05    73.98    -0.09    -0.06      D        D
 10   2019-12-31    9:10:11    72.25    0.04     25.90    74.02    -0.09    -0.06      D        D
 11   2019-12-31    9:25:11    72.25    0.04     26.30    74.02    -0.09    -0.06      D        D
 12   2019-12-31    9:40:11    72.25    0.05     26.15    74.07    -0.09    -0.06      D        D
 13   2019-12-31    9:55:11    72.25    0.06     26.35    74.03    -0.09    -0.06      D        D
 14   2019-12-31   10:10:11    72.25    0.06     26.30    74.07    -0.09    -0.06      D        D
 15   2019-12-31   10:25:11    72.50    0.06     26.25    74.07    -0.09    -0.06      D        D
 16   2019-12-31   10:40:11    72.50    0.05     26.35    74.07    -0.09    -0.06      D        D
 17   2019-12-31   10:55:11    72.50    0.05     26.15    74.12    -0.09    -0.06      D        D
 18   2019-12-31   11:10:11    72.50    0.05     26.10    74.21    -0.10    -0.06      D        D
 19   2019-12-31   11:25:11    72.50    0.06     26.00    74.25    -0.10    -0.06      D        D
 20   2019-12-31   11:40:11    72.50    0.06     26.00    74.30    -0.10    -0.06      D        D
 21   2019-12-31   11:55:11    72.50    0.04     26.05    74.43    -0.09    -0.06      D        D
 22   2019-12-31   12:10:11    72.50    0.06     25.84    74.47    -0.10    -0.06      D        D
 23   2019-12-31   12:25:11    72.50    0.05     25.95    74.57    -0.10    -0.06      D        D
 24   2019-12-31   12:40:11    72.50    0.06     26.00    74.59    -0.10    -0.06      D        D
 25   2019-12-31   12:55:11    72.75    0.06     25.90    74.61    -0.10    -0.06      D        D
 26   2019-12-31   13:10:11    72.50    0.05     26.20    74.66    -0.09    -0.06      D        D
 27   2019-12-31   13:25:11    72.50    0.06     26.20    74.66    -0.10    -0.06      D        D
 28   2019-12-31   13:40:11    72.50    0.06     26.30    74.66    -0.10    -0.06      D        D
 29   2019-12-31   13:55:11    72.50    0.04     26.40    74.66    -0.09    -0.06      D        D
 30   2019-12-31   14:10:11    72.50    0.05     26.41    74.70    -0.09    -0.06      D        D
 31   2019-12-31   14:25:11    72.50    0.06     26.05    74.66    -0.10    -0.06      D        D
 32   2019-12-31   14:40:11    72.50    0.05     26.00    74.66    -0.09    -0.06      D        D
 33   2019-12-31   14:55:11    72.50    0.06     25.85    74.66    -0.10    -0.06      D        D
 34   2019-12-31   15:10:11    72.50    0.06     25.19    74.70    -0.10    -0.06      D        D
 35   2019-12-31   15:25:11    72.50    0.05     26.45    74.66    -0.09    -0.06      D        D
 36   2019-12-31   15:40:11    72.50    0.06     27.37    74.66    -0.10    -0.06      D        D
 37   2019-12-31   15:55:11    72.50    0.06     27.21    74.70    -0.09    -0.06      D        D
 38   2019-12-31   16:10:11    72.50    0.04     26.20    74.70    -0.09    -0.06      D        D
 39   2019-12-31   16:25:11    72.50    0.04     25.76    74.71    -0.09    -0.06      D        D
 40   2019-12-31   16:40:11    72.50    0.05     25.44    74.70    -0.09    -0.06      D        D
 41   2019-12-31   16:55:11    72.50    0.06     25.40    74.71    -0.10    -0.06      D        D
 42   2019-12-31   17:10:11    72.50    0.06     25.20    74.66    -0.10    -0.06      D        D
 43   2019-12-31   17:25:11    72.50    0.04     25.04    74.66    -0.09    -0.06      D        D
 44   2019-12-31   17:40:11    72.50    0.05     25.30    74.66    -0.09    -0.06      D        D
 45   2019-12-31   17:55:11    72.50    0.06     25.29    74.70    -0.10    -0.06      D        D
 46   2019-12-31   18:10:11    72.50    0.05     25.39    74.75    -0.10    -0.06      D        D
 47   2019-12-31   18:25:11    72.50    0.06     25.59    74.70    -0.10    -0.06      D        D
 48   2019-12-31   18:40:11    72.50    0.06     25.59    74.75    -0.10    -0.06      D        D
 49   2019-12-31   18:55:11    72.50    0.05     26.00    74.75    -0.09    -0.06      D        D
 50   2019-12-31   19:10:11    72.25    0.06     25.65    74.61    -0.10    -0.06      D        D
 51   2019-12-31   19:25:11    72.00    0.05     27.06    74.39    -0.10    -0.06      D        D
 52   2019-12-31   19:40:11    71.75    0.06     27.97    74.21    -0.10    -0.06      D        D
 53   2019-12-31   19:55:11    71.50    0.05     26.05    74.07    -0.10    -0.06      D        D
 54   2019-12-31   20:10:11    71.50    0.05     26.21    73.93    -0.09    -0.06      D        D
 55   2019-12-31   20:25:11    71.25    0.06     26.41    73.76    -0.10    -0.06      D        D
 56   2019-12-31   20:40:11    71.25    0.05     26.41    73.71    -0.09    -0.06      D        D
 57   2019-12-31   20:55:11    71.75    0.06     26.00    73.71    -0.10    -0.06      D        D
 58   2019-12-31   21:10:11    72.00    0.06     27.06    73.80    -0.09    -0.06      D        D
 59   2019-12-31   21:25:11    72.25    0.05     27.62    72.99    -0.09    -0.06      D        D
 60   2019-12-31   21:40:11    72.50    0.06     27.72    72.12    -0.10    -0.06      D        D
 61   2019-12-31   21:55:11    72.50    0.06     25.90    72.35    -0.10    -0.06      D        D
 62   2019-12-31   22:10:11    72.50    0.05     25.24    72.84    -0.09    -0.06      D        D
 63   2019-12-31   22:25:11    72.75    0.05     24.79    73.25    -0.09    -0.06      D        D
 64   2019-12-31   22:40:11    72.50    0.04     24.43    73.06    -0.08    -0.06      D        D
 65   2019-12-31   22:55:11    72.00    0.05     23.43    72.52    -0.09    -0.06      D        D
 66   2019-12-31   23:10:11    71.50    0.05     22.82    72.75    -0.09    -0.06      D        D
 67   2019-12-31   23:25:11    71.50    0.04     22.32    73.02    -0.09    -0.06      D        D
 68   2019-12-31   23:40:11    71.25    0.04     21.66    73.20    -0.09    -0.06      D        D
 69   2019-12-31   23:55:11    71.50    0.05     21.30    73.38    -0.09    -0.06      D        D
 70   2020-01-01    0:10:11    71.75    0.04     20.75    73.57    -0.09    -0.06      D        D
 71   2020-01-01    0:25:11    72.25    0.04     20.15    73.75    -0.09    -0.06      D        D
 72   2020-01-01    0:40:11    72.25    0.04     19.60    73.89    -0.09    -0.06      D        D
 73   2020-01-01    0:55:11    72.50    0.04     19.44    73.98    -0.09    -0.06      D        D
 74   2020-01-01    1:10:11    72.50    0.04     19.19    74.07    -0.09    -0.06      D        D
 75   2020-01-01    1:25:11    72.50    0.05     19.35    74.12    -0.09    -0.06      D        D
 76   2020-01-01    1:40:11    72.50    0.04     19.79    74.16    -0.09    -0.06      D        D
 77   2020-01-01    1:55:11    72.50    0.04     19.90    74.16    -0.08    -0.06      D        D
 78   2020-01-01    2:10:11    72.50    0.05     20.45    73.29    -0.09    -0.06      D        D
 79   2020-01-01    2:25:11    72.50    0.04     20.36    73.35    -0.09    -0.06      D        D
 80   2020-01-01    2:40:11    72.50    0.04     19.90    73.57    -0.09    -0.06      D        D
 81   2020-01-01    2:55:11    72.75    0.04     19.49    73.80    -0.09    -0.06      D        D
 82   2020-01-01    3:10:11      T       T         T        T        T        T        D        D
 83   2020-01-01    3:25:11    72.50    0.04     19.34    74.16    -0.09    -0.06      D        D
 84   2020-01-01    3:40:11    72.50    0.05     18.74    74.29    -0.09    -0.06      D        D
 85   2020-01-01    3:55:11    72.50    0.04     18.44    74.31    -0.09    -0.06      D        D
 86   2020-01-01    4:10:11    72.50    0.04     18.39    74.35    -0.09    -0.06      D        D
 87   2020-01-01    4:25:11    72.50    0.05     18.43    74.34    -0.09    -0.06      D        D
 88   2020-01-01    4:40:11    72.50    0.05     18.34    74.39    -0.09    -0.06      D        D
 89   2020-01-01    4:55:11    72.50    0.04     18.33    74.34    -0.09    -0.06      D        D
 90   2020-01-01    5:10:11    72.50    0.04     18.13    74.35    -0.09    -0.06      D        D
 91   2020-01-01    5:25:11    72.50    0.05     17.83    74.39    -0.09    -0.06      D        D
 92   2020-01-01    5:40:11    72.50    0.04     17.53    74.30    -0.09    -0.06      D        D
 93   2020-01-01    5:55:11    72.50    0.05     17.43    74.30    -0.09    -0.06      D        D
 94   2020-01-01    6:10:11    72.50    0.04     17.63    74.25    -0.09    -0.06      D        D
 95   2020-01-01    6:25:11    72.25    0.04     18.38    74.25    -0.09    -0.06      D        D
rmacarthur
Pyrite | Level 9
Hi Tom,
Thank you very much, I"ve learned alot from the code you've provided, The file was complex because the data did not start until line 15, and this is a part of SAS coding I'm not very familiar with. Much appreciated, !
R.
ballardw
Super User

@rmacarthur wrote:

Hi SAS Friends, 

 

Because there are so many, individually saving them as a .xlsx file for import seems impractical, which is how many SAS programmers handle this.

 


Because Import does not deal well with data with more than a single header row to get variable names from this would be a bad idea quickly. Second, each time you invoke Proc Import the procedure makes a potentially quite different set of guesses as to variable type and length it is often difficult to combine data imported from xlsx because some variables end up with numeric and others with character for the same column. Which SAS won't allow and you can spend more time "fixing" things than you might have saved with import.

 

Moral of the story: If the files have the same structure read them with a data step.

rmacarthur
Pyrite | Level 9
Thanks, very helpful, we've got that sorted out and sample code is above. Now the trick is how to import each individually named file, using data steps. I image that would involve a macro or do loop.
RichardDeVen
Barite | Level 11

@Tom showed you how to read one file.  The concept can be extended because a single data step can read multiple files that match a wildcarded operating system filename.  Because all the data is read as if from one sequential stream you can not rely on the FIRSTOBS= infile option.  Instead you will want to examine a held input of the raw line and examine it for landmarks pertinent to subsequent statements in the data step.  Also, use FILEVAR= to capture the source filename into a automatic variable that can be assigned to an output data set variable.  NOTE: Automatic variables are part of the program data vector (PDV) but are always automatically dropped and thus can not be directly part of an output data set.

 

Example (untested):

 

data want(label="Input from many csv");
length source_filename $200;   infile "&path.\*.csv" filevar=in_filename;   retain input_now_flag 0 source_filename;
  if lag (in_filename) ne in_filename then do; /* start of reading from next file */
source_filename = in_filename;     input_now_flag = 0;
  end;   input @; /* held input for checking on landmarks */ if _infile_ =: '<>' then do; /* landmark for subsequent rows are data */ input_now_flag = 1; delete; end;
if _infile_ =: '//' then do; /* landmark for end of data, ignore anything that might be after it */
input_now_flag = 0;
delete;
end;
  if not input_now_flag then delete; /* have not reached the data landmark yet */
/* presume all *.csv files have the same column structure and order */ input date :mmddyy. time :time. (point1-point8) (:nodata.);
format date yymmdd10. time time8. ;
keep source_filename date time point1-point8; run;

 

 

rmacarthur
Pyrite | Level 9

Hi Richard, 

Thank you very much for the reply.  Have learned allot from your sample code. 

In running this code, I received an error message,

ERROR: A Physical file reference (i.e. "PHYSICAL FILE REFERENCE" ) or an aggregate file storage
reference (i.e. AGGREGATE(MEMBER) ) reference cannot be used with the FILEVAR= option."  

But changing "FILEVAR" to "FILENAME", based on some SAS community comments, allowed the code to run. 

 

Once running, the code recognizes each file in the directory, there is a message like this, once for each file:

NOTE: The infile "C:\Users\SAS\Box\DATA_RENO\*.csv" is:
Filename=C:\Users\SAS\Box\DATA_RENO\Temp_12-31-19_06-55.Hum.Press-DA.csv,
File List=C:\Users\SAS\Box\DATA_RENO\*.csv,
RECFM=V,LRECL=32767

 

Then, for each file, there is another line like this:

NOTE: 110 records were read from the infile "C:\Users\SAS\Box\DATA_RENO\*.csv".
The minimum record length was 2.
The maximum record length was 115.

 

NOTE: The data set WORK.WANT has 0 observations and 11 variables.
NOTE: DATA statement used (Total process time):
real time 1.12 seconds
cpu time 0.68 seconds

 

But it doesn't seem to aggregate the data into one file. 

 

Should I be adding a Landmark, at this point in the code ?

 

if _infile_ =: '<>' then do; /* landmark for subsequent rows are data */

 

Have pasted the code below for ease of reference, 

data want_R (label="Input from many csv");
  length source_filename $200;
  infile "C:\Users\SAS\Box\DATA_RENO\*.csv" filevar=in_filename;
  retain input_now_flag 0 source_filename;

  if lag (in_filename) ne in_filename then do;    /* start of reading from next file */
    source_filename = in_filename;
    input_now_flag = 0;
  end;

  input @;                        /* held input for checking on landmarks */

  if _infile_ =: '<>' then do;    /* landmark for subsequent rows are data */
    input_now_flag = 1;
    delete;
  end;

  if _infile_ =: '//' then do;  /* landmark for end of data, ignore anything that might be after it */
    input_now_flag = 0;
    delete;
  end;

  if not input_now_flag then delete;   /* have not reached the data landmark yet */

  /* presume all *.csv files have the same column structure and order */
  input date :mmddyy. time :time. (point1-point8) (:nodata.);

  format date yymmdd10. time time8. ;
  keep source_filename date time point1-point8;
run;

Thanks again for all of this, am learning a great deal about importing via a datastep !

Robert

RichardDeVen
Barite | Level 11

Thanks for catching the FILENAME problem in untested code. End of post has tested example with generated data.

 

You probably want to check the value returned by filename=, it's probably $8, and should be lengthened 

  length source_filename in_filename $200 ;

Why is filename= automatic variable default length of $8? Probably because standard file references (i.e. filename <fileref> "path") are identifiers (a reference name) with an 8 character limit.

 

Yes, lots of files makes for lots of NOTES. You can get one NOTE if you use the operating system to your advantage, however you would lose the information about which file a record came from (if that loss is OK, hey no problem)

 

Use OS to execute a command that concatenates all the files together, and read from that commands stdout

filename ALLCSV pipe "powershell -c ""cat &PATH.\*.csv""";
data want;
infile ALLCSV dsd missover;
...

 

Back to business....

The first 'landmark' identifies the header line of the data. I chose the test to be "a line starting with <> is the header line" based on the sample data in the question. 

<>Date Time Point_1 Point_2 Point_3 Point_4 Point_5 Point_6 Point_7 Point_8

The next line after the header line should be data.

If your test for the header line landmark is incorrect the program will never reach the second INPUT statement, and you will get 0 observations.

 

I chose// to be the landmark for end of data, again based on the sample data. Regardless, the input_now_flag is reset whenever a new filename is encountered.

12/31/2019 7:25:11 72.25 0.05 26.81 73.62 -0.09 -0.06 No Data No Data
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////

 

 

Tested example (900 data rows written and 900 read in) :

data _null_;
  do i = 1 to 9;
    out_filename = cats("C:\temp\",i,".csv");
    file multiple filevar=out_filename;
    do j = 1 to 15;
      put 'header detail' j;
    end;
    put '<>a,b,c';
    do z = 1 to 100;
      a = i * 100 + z-1;
      b = a**2;
      c = sqrt(a);
      put a ',' b ',' c;
    end;
    put '/////////';
    put '999,999,999, i dare you to ignore me';
  end;
run;

proc format ;
  invalue  nodata
    'TIME (-3)'=.t
    'No Data'=.d 
     other=_same_
  ; 
run;

filename ALLCSV pipe "powershell -c ""cat c:\temp\?.csv""";

data want;
  length source_filename $200 in_filename $200 ;

  infile 'c:\temp\?.csv' filename=in_filename dsd missover ;

/*  infile ALLCSV dsd missover;*/

  retain input_now_flag 0 source_filename;

  if lag (in_filename) ne in_filename then do;    /* start of reading from next file */
    source_filename = in_filename;
    input_now_flag = 0;
  end;

  input @;

  if _infile_ =: '<>' then do;    /* landmark for subsequent rows are data */
    input_now_flag = 1;
    delete;
  end;

  if _infile_ =: '//' then do;  /* landmark for end of data, ignore anything that might be after it */
    input_now_flag = 0;
    delete;
  end;

  if not input_now_flag then delete;   /* have not reached the data landmark yet */

  /* presume all *.csv files have the same column structure and order */
  input @1 a b c; *date :mmddyy. time :time. (point1-point8) (:nodata.);

  drop input_now_flag;
run;

 

rmacarthur
Pyrite | Level 9

Hi Richard, 

Thanks for the reply and clarifications, I understand the code much better now.  

With help from you and Tom, the data have been imported and connected and I was able to generate our first summary report today.  

Which is great to have gotten done.

Many thanks for the detailed recommendations and collective patience, 

Robert

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 16 replies
  • 3330 views
  • 8 likes
  • 5 in conversation