Hi SAS Friends,
Need to import and analyze hundreds of .csv files that contain environmental data.
a sample has been uploaded here
Because there are so many, individually saving them as a .xlsx file for import seems impractical, which is how many SAS programmers handle this.
The first 13 lines of data are not needed.
line 14 has the column header information, and then each line thereafter has data in rows, and the data structure is straight forward.
I've attached a sample file.
Question is , how can this be imported as a CSV file, after which the first 13 lines are deleted, and keeping headers and data structure that follow.
Here's a sample of what it looks like:
Key Name:Suffix Trend Definitions Used
Point_1: HOSP_1FL_PHARM_FCU01:ROOM TEMP 15 minutes
Point_2: HOS.1FL.ANTE.RM.DIFF.PRESS 15 minutes
Point_3: HOS.1FL.ANTE.RM.HUMID 15 minutes
Point_4: HOS.1FL.ANTE.RM.TEMP 15 minutes
Point_5: HOS.1FL.CLEAN.RM.DIFF.PRESS 15 minutes
Point_6: HOS.1FL.STORAGE.DIFF.PRESS 15 minutes
Point_7: pCOWeb7700_A003 Error: Point not found in database.
Point_8: pCOWeb7700_A004 Error: Point not found in database.
Time Interval: 15 Minutes
Date Range: 12/31/2019 06:55:11 - 1/1/2020 06:55:11
Report Timings: All Hours
<>Date Time Point_1 Point_2 Point_3 Point_4 Point_5 Point_6 Point_7 Point_8
12/31/2019 6:55:11 72 0.04 26.81 73.53 -0.09 -0.06 No Data No Data
12/31/2019 7:10:11 72.25 0.04 26.81 73.57 -0.09 -0.06 No Data No Data
12/31/2019 7:25:11 72.25 0.05 26.81 73.62 -0.09 -0.06 No Data No Data
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
Any advice is greatly appreciated,
Thanks !
That files does not look very complicated.
%let path=C:\downloads;
%let fname=Temp_ex.csv ;
proc format ;
invalue nodata
'TIME (-3)'=.t
'No Data'=.d
other=_same_
;
run;
data want;
infile "&path/&fname" dsd truncover firstobs=15;
input @;
if index(_infile_,'End of Report') then delete;
input date :mmddyy. time :time. (point1-point8) (:nodata.);
format date yymmdd10. time time8. ;
run;
proc print;
run;
Results:
Obs date time point1 point2 point3 point4 point5 point6 point7 point8 1 2019-12-31 6:55:11 72.00 0.04 26.81 73.53 -0.09 -0.06 D D 2 2019-12-31 7:10:11 72.25 0.04 26.81 73.57 -0.09 -0.06 D D 3 2019-12-31 7:25:11 72.25 0.05 26.81 73.62 -0.09 -0.06 D D 4 2019-12-31 7:40:11 72.25 0.04 26.35 73.67 -0.09 -0.06 D D 5 2019-12-31 7:55:11 72.25 0.05 26.20 73.75 -0.09 -0.06 D D 6 2019-12-31 8:10:11 72.25 0.04 26.10 73.80 -0.09 -0.06 D D 7 2019-12-31 8:25:11 72.25 0.04 26.15 73.84 -0.09 -0.06 D D 8 2019-12-31 8:40:11 72.25 0.04 26.15 73.93 -0.09 -0.06 D D 9 2019-12-31 8:55:11 72.25 0.05 26.05 73.98 -0.09 -0.06 D D 10 2019-12-31 9:10:11 72.25 0.04 25.90 74.02 -0.09 -0.06 D D 11 2019-12-31 9:25:11 72.25 0.04 26.30 74.02 -0.09 -0.06 D D 12 2019-12-31 9:40:11 72.25 0.05 26.15 74.07 -0.09 -0.06 D D 13 2019-12-31 9:55:11 72.25 0.06 26.35 74.03 -0.09 -0.06 D D 14 2019-12-31 10:10:11 72.25 0.06 26.30 74.07 -0.09 -0.06 D D 15 2019-12-31 10:25:11 72.50 0.06 26.25 74.07 -0.09 -0.06 D D 16 2019-12-31 10:40:11 72.50 0.05 26.35 74.07 -0.09 -0.06 D D 17 2019-12-31 10:55:11 72.50 0.05 26.15 74.12 -0.09 -0.06 D D 18 2019-12-31 11:10:11 72.50 0.05 26.10 74.21 -0.10 -0.06 D D 19 2019-12-31 11:25:11 72.50 0.06 26.00 74.25 -0.10 -0.06 D D 20 2019-12-31 11:40:11 72.50 0.06 26.00 74.30 -0.10 -0.06 D D 21 2019-12-31 11:55:11 72.50 0.04 26.05 74.43 -0.09 -0.06 D D 22 2019-12-31 12:10:11 72.50 0.06 25.84 74.47 -0.10 -0.06 D D 23 2019-12-31 12:25:11 72.50 0.05 25.95 74.57 -0.10 -0.06 D D 24 2019-12-31 12:40:11 72.50 0.06 26.00 74.59 -0.10 -0.06 D D 25 2019-12-31 12:55:11 72.75 0.06 25.90 74.61 -0.10 -0.06 D D 26 2019-12-31 13:10:11 72.50 0.05 26.20 74.66 -0.09 -0.06 D D 27 2019-12-31 13:25:11 72.50 0.06 26.20 74.66 -0.10 -0.06 D D 28 2019-12-31 13:40:11 72.50 0.06 26.30 74.66 -0.10 -0.06 D D 29 2019-12-31 13:55:11 72.50 0.04 26.40 74.66 -0.09 -0.06 D D 30 2019-12-31 14:10:11 72.50 0.05 26.41 74.70 -0.09 -0.06 D D 31 2019-12-31 14:25:11 72.50 0.06 26.05 74.66 -0.10 -0.06 D D 32 2019-12-31 14:40:11 72.50 0.05 26.00 74.66 -0.09 -0.06 D D 33 2019-12-31 14:55:11 72.50 0.06 25.85 74.66 -0.10 -0.06 D D 34 2019-12-31 15:10:11 72.50 0.06 25.19 74.70 -0.10 -0.06 D D 35 2019-12-31 15:25:11 72.50 0.05 26.45 74.66 -0.09 -0.06 D D 36 2019-12-31 15:40:11 72.50 0.06 27.37 74.66 -0.10 -0.06 D D 37 2019-12-31 15:55:11 72.50 0.06 27.21 74.70 -0.09 -0.06 D D 38 2019-12-31 16:10:11 72.50 0.04 26.20 74.70 -0.09 -0.06 D D 39 2019-12-31 16:25:11 72.50 0.04 25.76 74.71 -0.09 -0.06 D D 40 2019-12-31 16:40:11 72.50 0.05 25.44 74.70 -0.09 -0.06 D D 41 2019-12-31 16:55:11 72.50 0.06 25.40 74.71 -0.10 -0.06 D D 42 2019-12-31 17:10:11 72.50 0.06 25.20 74.66 -0.10 -0.06 D D 43 2019-12-31 17:25:11 72.50 0.04 25.04 74.66 -0.09 -0.06 D D 44 2019-12-31 17:40:11 72.50 0.05 25.30 74.66 -0.09 -0.06 D D 45 2019-12-31 17:55:11 72.50 0.06 25.29 74.70 -0.10 -0.06 D D 46 2019-12-31 18:10:11 72.50 0.05 25.39 74.75 -0.10 -0.06 D D 47 2019-12-31 18:25:11 72.50 0.06 25.59 74.70 -0.10 -0.06 D D 48 2019-12-31 18:40:11 72.50 0.06 25.59 74.75 -0.10 -0.06 D D 49 2019-12-31 18:55:11 72.50 0.05 26.00 74.75 -0.09 -0.06 D D 50 2019-12-31 19:10:11 72.25 0.06 25.65 74.61 -0.10 -0.06 D D 51 2019-12-31 19:25:11 72.00 0.05 27.06 74.39 -0.10 -0.06 D D 52 2019-12-31 19:40:11 71.75 0.06 27.97 74.21 -0.10 -0.06 D D 53 2019-12-31 19:55:11 71.50 0.05 26.05 74.07 -0.10 -0.06 D D 54 2019-12-31 20:10:11 71.50 0.05 26.21 73.93 -0.09 -0.06 D D 55 2019-12-31 20:25:11 71.25 0.06 26.41 73.76 -0.10 -0.06 D D 56 2019-12-31 20:40:11 71.25 0.05 26.41 73.71 -0.09 -0.06 D D 57 2019-12-31 20:55:11 71.75 0.06 26.00 73.71 -0.10 -0.06 D D 58 2019-12-31 21:10:11 72.00 0.06 27.06 73.80 -0.09 -0.06 D D 59 2019-12-31 21:25:11 72.25 0.05 27.62 72.99 -0.09 -0.06 D D 60 2019-12-31 21:40:11 72.50 0.06 27.72 72.12 -0.10 -0.06 D D 61 2019-12-31 21:55:11 72.50 0.06 25.90 72.35 -0.10 -0.06 D D 62 2019-12-31 22:10:11 72.50 0.05 25.24 72.84 -0.09 -0.06 D D 63 2019-12-31 22:25:11 72.75 0.05 24.79 73.25 -0.09 -0.06 D D 64 2019-12-31 22:40:11 72.50 0.04 24.43 73.06 -0.08 -0.06 D D 65 2019-12-31 22:55:11 72.00 0.05 23.43 72.52 -0.09 -0.06 D D 66 2019-12-31 23:10:11 71.50 0.05 22.82 72.75 -0.09 -0.06 D D 67 2019-12-31 23:25:11 71.50 0.04 22.32 73.02 -0.09 -0.06 D D 68 2019-12-31 23:40:11 71.25 0.04 21.66 73.20 -0.09 -0.06 D D 69 2019-12-31 23:55:11 71.50 0.05 21.30 73.38 -0.09 -0.06 D D 70 2020-01-01 0:10:11 71.75 0.04 20.75 73.57 -0.09 -0.06 D D 71 2020-01-01 0:25:11 72.25 0.04 20.15 73.75 -0.09 -0.06 D D 72 2020-01-01 0:40:11 72.25 0.04 19.60 73.89 -0.09 -0.06 D D 73 2020-01-01 0:55:11 72.50 0.04 19.44 73.98 -0.09 -0.06 D D 74 2020-01-01 1:10:11 72.50 0.04 19.19 74.07 -0.09 -0.06 D D 75 2020-01-01 1:25:11 72.50 0.05 19.35 74.12 -0.09 -0.06 D D 76 2020-01-01 1:40:11 72.50 0.04 19.79 74.16 -0.09 -0.06 D D 77 2020-01-01 1:55:11 72.50 0.04 19.90 74.16 -0.08 -0.06 D D 78 2020-01-01 2:10:11 72.50 0.05 20.45 73.29 -0.09 -0.06 D D 79 2020-01-01 2:25:11 72.50 0.04 20.36 73.35 -0.09 -0.06 D D 80 2020-01-01 2:40:11 72.50 0.04 19.90 73.57 -0.09 -0.06 D D 81 2020-01-01 2:55:11 72.75 0.04 19.49 73.80 -0.09 -0.06 D D 82 2020-01-01 3:10:11 T T T T T T D D 83 2020-01-01 3:25:11 72.50 0.04 19.34 74.16 -0.09 -0.06 D D 84 2020-01-01 3:40:11 72.50 0.05 18.74 74.29 -0.09 -0.06 D D 85 2020-01-01 3:55:11 72.50 0.04 18.44 74.31 -0.09 -0.06 D D 86 2020-01-01 4:10:11 72.50 0.04 18.39 74.35 -0.09 -0.06 D D 87 2020-01-01 4:25:11 72.50 0.05 18.43 74.34 -0.09 -0.06 D D 88 2020-01-01 4:40:11 72.50 0.05 18.34 74.39 -0.09 -0.06 D D 89 2020-01-01 4:55:11 72.50 0.04 18.33 74.34 -0.09 -0.06 D D 90 2020-01-01 5:10:11 72.50 0.04 18.13 74.35 -0.09 -0.06 D D 91 2020-01-01 5:25:11 72.50 0.05 17.83 74.39 -0.09 -0.06 D D 92 2020-01-01 5:40:11 72.50 0.04 17.53 74.30 -0.09 -0.06 D D 93 2020-01-01 5:55:11 72.50 0.05 17.43 74.30 -0.09 -0.06 D D 94 2020-01-01 6:10:11 72.50 0.04 17.63 74.25 -0.09 -0.06 D D 95 2020-01-01 6:25:11 72.25 0.04 18.38 74.25 -0.09 -0.06 D D
Are the columns always DATE, TIME, POINT_1 - POINT_8 ?
If so, the data from all the hundreds of data files can be placed in one data set.
If not, what is your strategy for naming the data set for a given data file ?
Are you sure you want to discard the auxiliary information about each POINT_<n> column ?
Hi Richard,
Thanks for the comments !
Are the columns always DATE, TIME, POINT_1 - POINT_8 ?
yes
If not, what is your strategy for naming the data set for a given data file ?
was planning to keep each the original filename, and just remove the troublesome characters (see below). However that may not be the best approach
Are you sure you want to discard the auxiliary information about each POINT_<n> column ?
Yes, those lines are all the same file to file and just define each variable.
If so, the data from all the hundreds of data files can be placed in one data set.
Yes, that's the goal. A macro or do loop is needed for that, right? That can read the consecutive CSV file names from the directory, and import them. I don't know how to do that, too.
Because the file names have "._-" in them, I've modified the code from Tom to get rid of those, which I've pasted below. So how can I take that further to import each uniquely named data set, one at a time, to create one large file?
/*************************************/
%let fname=Temp_01-01-20_06-55.Hum.Press-DA.csv ; /*THis is what the actual file names look like*/
data e ;
fn2 = symget("fname") ;
fn1 = compress(fn2, "-._ ") ;
call symput('fn', fn1) ;
run ;
proc format ;
invalue nodata
'TIME (-3)'=.t
'No Data'=.d
other=_same_
;
run;
data &fn;
infile "&RenoD/&fname" dsd truncover firstobs=15;
input @;
if index(_infile_,'End of Report') then delete;
input date :mmddyy. time :time. (point1-point8) (:nodata.);
format date yymmdd10. time time8. ;
run;
proc print;
run;
/*************************************/
If they are all in one directory should be able to just use a wildcard in INFILE statement.
%let renod=your path here;
data want;
length filename $256 fname $50 ;
infile "&RenoD/*.csv" dsd truncover filevar=filename;
input @;
fname = scan(filename,-1,'/\');
if lag(filename) ne filename then row=0;
row+1;
if row <15 or index(_infile_,'End of Report') then delete;
input date :mmddyy. time :time. (point1-point8) (:nodata.);
format date yymmdd10. time time8. ;
run;
If you they are scattered or mixed then make a dataset with the list of files instead. Then use that list to drive reading the files. So if you have dataset name FILELIST with a variable named FILENAME you could use this data step to read each file starting at line 15 and skipping the last row.
data want;
set filelist ;
filename2 = filename;
infile csv dsd truncover filename=filename2 end=eof firstobs=15;
do while (not eof);
input @;
if not eof then do;
input date :mmddyy. time :time. (point1-point8) (:nodata.);
output;
end;
end;
format date yymmdd10. time time8. ;
run;
Hi Tom,
Thank you very much. The files are all in one place, so I ran the first code and got the following ,
" A Physical file reference (i.e. "PHYSICAL FILE REFERENCE" ) or an aggregate file storage reference (i.e. AGGREGATE(MEMBER) ) reference cannot be used with the FILEVAR= option."
However changing "FILEVAR" to "FILENAME" allowed the code to work like a charm. That modification was recommended in some other SAS community postings.
The output data look great, much appreciated.
Robert
Are the files complex because of the header rows?
Do all files have the same variables and layout beyond that?
Is it always 13 lines in the header?
That files does not look very complicated.
%let path=C:\downloads;
%let fname=Temp_ex.csv ;
proc format ;
invalue nodata
'TIME (-3)'=.t
'No Data'=.d
other=_same_
;
run;
data want;
infile "&path/&fname" dsd truncover firstobs=15;
input @;
if index(_infile_,'End of Report') then delete;
input date :mmddyy. time :time. (point1-point8) (:nodata.);
format date yymmdd10. time time8. ;
run;
proc print;
run;
Results:
Obs date time point1 point2 point3 point4 point5 point6 point7 point8 1 2019-12-31 6:55:11 72.00 0.04 26.81 73.53 -0.09 -0.06 D D 2 2019-12-31 7:10:11 72.25 0.04 26.81 73.57 -0.09 -0.06 D D 3 2019-12-31 7:25:11 72.25 0.05 26.81 73.62 -0.09 -0.06 D D 4 2019-12-31 7:40:11 72.25 0.04 26.35 73.67 -0.09 -0.06 D D 5 2019-12-31 7:55:11 72.25 0.05 26.20 73.75 -0.09 -0.06 D D 6 2019-12-31 8:10:11 72.25 0.04 26.10 73.80 -0.09 -0.06 D D 7 2019-12-31 8:25:11 72.25 0.04 26.15 73.84 -0.09 -0.06 D D 8 2019-12-31 8:40:11 72.25 0.04 26.15 73.93 -0.09 -0.06 D D 9 2019-12-31 8:55:11 72.25 0.05 26.05 73.98 -0.09 -0.06 D D 10 2019-12-31 9:10:11 72.25 0.04 25.90 74.02 -0.09 -0.06 D D 11 2019-12-31 9:25:11 72.25 0.04 26.30 74.02 -0.09 -0.06 D D 12 2019-12-31 9:40:11 72.25 0.05 26.15 74.07 -0.09 -0.06 D D 13 2019-12-31 9:55:11 72.25 0.06 26.35 74.03 -0.09 -0.06 D D 14 2019-12-31 10:10:11 72.25 0.06 26.30 74.07 -0.09 -0.06 D D 15 2019-12-31 10:25:11 72.50 0.06 26.25 74.07 -0.09 -0.06 D D 16 2019-12-31 10:40:11 72.50 0.05 26.35 74.07 -0.09 -0.06 D D 17 2019-12-31 10:55:11 72.50 0.05 26.15 74.12 -0.09 -0.06 D D 18 2019-12-31 11:10:11 72.50 0.05 26.10 74.21 -0.10 -0.06 D D 19 2019-12-31 11:25:11 72.50 0.06 26.00 74.25 -0.10 -0.06 D D 20 2019-12-31 11:40:11 72.50 0.06 26.00 74.30 -0.10 -0.06 D D 21 2019-12-31 11:55:11 72.50 0.04 26.05 74.43 -0.09 -0.06 D D 22 2019-12-31 12:10:11 72.50 0.06 25.84 74.47 -0.10 -0.06 D D 23 2019-12-31 12:25:11 72.50 0.05 25.95 74.57 -0.10 -0.06 D D 24 2019-12-31 12:40:11 72.50 0.06 26.00 74.59 -0.10 -0.06 D D 25 2019-12-31 12:55:11 72.75 0.06 25.90 74.61 -0.10 -0.06 D D 26 2019-12-31 13:10:11 72.50 0.05 26.20 74.66 -0.09 -0.06 D D 27 2019-12-31 13:25:11 72.50 0.06 26.20 74.66 -0.10 -0.06 D D 28 2019-12-31 13:40:11 72.50 0.06 26.30 74.66 -0.10 -0.06 D D 29 2019-12-31 13:55:11 72.50 0.04 26.40 74.66 -0.09 -0.06 D D 30 2019-12-31 14:10:11 72.50 0.05 26.41 74.70 -0.09 -0.06 D D 31 2019-12-31 14:25:11 72.50 0.06 26.05 74.66 -0.10 -0.06 D D 32 2019-12-31 14:40:11 72.50 0.05 26.00 74.66 -0.09 -0.06 D D 33 2019-12-31 14:55:11 72.50 0.06 25.85 74.66 -0.10 -0.06 D D 34 2019-12-31 15:10:11 72.50 0.06 25.19 74.70 -0.10 -0.06 D D 35 2019-12-31 15:25:11 72.50 0.05 26.45 74.66 -0.09 -0.06 D D 36 2019-12-31 15:40:11 72.50 0.06 27.37 74.66 -0.10 -0.06 D D 37 2019-12-31 15:55:11 72.50 0.06 27.21 74.70 -0.09 -0.06 D D 38 2019-12-31 16:10:11 72.50 0.04 26.20 74.70 -0.09 -0.06 D D 39 2019-12-31 16:25:11 72.50 0.04 25.76 74.71 -0.09 -0.06 D D 40 2019-12-31 16:40:11 72.50 0.05 25.44 74.70 -0.09 -0.06 D D 41 2019-12-31 16:55:11 72.50 0.06 25.40 74.71 -0.10 -0.06 D D 42 2019-12-31 17:10:11 72.50 0.06 25.20 74.66 -0.10 -0.06 D D 43 2019-12-31 17:25:11 72.50 0.04 25.04 74.66 -0.09 -0.06 D D 44 2019-12-31 17:40:11 72.50 0.05 25.30 74.66 -0.09 -0.06 D D 45 2019-12-31 17:55:11 72.50 0.06 25.29 74.70 -0.10 -0.06 D D 46 2019-12-31 18:10:11 72.50 0.05 25.39 74.75 -0.10 -0.06 D D 47 2019-12-31 18:25:11 72.50 0.06 25.59 74.70 -0.10 -0.06 D D 48 2019-12-31 18:40:11 72.50 0.06 25.59 74.75 -0.10 -0.06 D D 49 2019-12-31 18:55:11 72.50 0.05 26.00 74.75 -0.09 -0.06 D D 50 2019-12-31 19:10:11 72.25 0.06 25.65 74.61 -0.10 -0.06 D D 51 2019-12-31 19:25:11 72.00 0.05 27.06 74.39 -0.10 -0.06 D D 52 2019-12-31 19:40:11 71.75 0.06 27.97 74.21 -0.10 -0.06 D D 53 2019-12-31 19:55:11 71.50 0.05 26.05 74.07 -0.10 -0.06 D D 54 2019-12-31 20:10:11 71.50 0.05 26.21 73.93 -0.09 -0.06 D D 55 2019-12-31 20:25:11 71.25 0.06 26.41 73.76 -0.10 -0.06 D D 56 2019-12-31 20:40:11 71.25 0.05 26.41 73.71 -0.09 -0.06 D D 57 2019-12-31 20:55:11 71.75 0.06 26.00 73.71 -0.10 -0.06 D D 58 2019-12-31 21:10:11 72.00 0.06 27.06 73.80 -0.09 -0.06 D D 59 2019-12-31 21:25:11 72.25 0.05 27.62 72.99 -0.09 -0.06 D D 60 2019-12-31 21:40:11 72.50 0.06 27.72 72.12 -0.10 -0.06 D D 61 2019-12-31 21:55:11 72.50 0.06 25.90 72.35 -0.10 -0.06 D D 62 2019-12-31 22:10:11 72.50 0.05 25.24 72.84 -0.09 -0.06 D D 63 2019-12-31 22:25:11 72.75 0.05 24.79 73.25 -0.09 -0.06 D D 64 2019-12-31 22:40:11 72.50 0.04 24.43 73.06 -0.08 -0.06 D D 65 2019-12-31 22:55:11 72.00 0.05 23.43 72.52 -0.09 -0.06 D D 66 2019-12-31 23:10:11 71.50 0.05 22.82 72.75 -0.09 -0.06 D D 67 2019-12-31 23:25:11 71.50 0.04 22.32 73.02 -0.09 -0.06 D D 68 2019-12-31 23:40:11 71.25 0.04 21.66 73.20 -0.09 -0.06 D D 69 2019-12-31 23:55:11 71.50 0.05 21.30 73.38 -0.09 -0.06 D D 70 2020-01-01 0:10:11 71.75 0.04 20.75 73.57 -0.09 -0.06 D D 71 2020-01-01 0:25:11 72.25 0.04 20.15 73.75 -0.09 -0.06 D D 72 2020-01-01 0:40:11 72.25 0.04 19.60 73.89 -0.09 -0.06 D D 73 2020-01-01 0:55:11 72.50 0.04 19.44 73.98 -0.09 -0.06 D D 74 2020-01-01 1:10:11 72.50 0.04 19.19 74.07 -0.09 -0.06 D D 75 2020-01-01 1:25:11 72.50 0.05 19.35 74.12 -0.09 -0.06 D D 76 2020-01-01 1:40:11 72.50 0.04 19.79 74.16 -0.09 -0.06 D D 77 2020-01-01 1:55:11 72.50 0.04 19.90 74.16 -0.08 -0.06 D D 78 2020-01-01 2:10:11 72.50 0.05 20.45 73.29 -0.09 -0.06 D D 79 2020-01-01 2:25:11 72.50 0.04 20.36 73.35 -0.09 -0.06 D D 80 2020-01-01 2:40:11 72.50 0.04 19.90 73.57 -0.09 -0.06 D D 81 2020-01-01 2:55:11 72.75 0.04 19.49 73.80 -0.09 -0.06 D D 82 2020-01-01 3:10:11 T T T T T T D D 83 2020-01-01 3:25:11 72.50 0.04 19.34 74.16 -0.09 -0.06 D D 84 2020-01-01 3:40:11 72.50 0.05 18.74 74.29 -0.09 -0.06 D D 85 2020-01-01 3:55:11 72.50 0.04 18.44 74.31 -0.09 -0.06 D D 86 2020-01-01 4:10:11 72.50 0.04 18.39 74.35 -0.09 -0.06 D D 87 2020-01-01 4:25:11 72.50 0.05 18.43 74.34 -0.09 -0.06 D D 88 2020-01-01 4:40:11 72.50 0.05 18.34 74.39 -0.09 -0.06 D D 89 2020-01-01 4:55:11 72.50 0.04 18.33 74.34 -0.09 -0.06 D D 90 2020-01-01 5:10:11 72.50 0.04 18.13 74.35 -0.09 -0.06 D D 91 2020-01-01 5:25:11 72.50 0.05 17.83 74.39 -0.09 -0.06 D D 92 2020-01-01 5:40:11 72.50 0.04 17.53 74.30 -0.09 -0.06 D D 93 2020-01-01 5:55:11 72.50 0.05 17.43 74.30 -0.09 -0.06 D D 94 2020-01-01 6:10:11 72.50 0.04 17.63 74.25 -0.09 -0.06 D D 95 2020-01-01 6:25:11 72.25 0.04 18.38 74.25 -0.09 -0.06 D D
@rmacarthur wrote:
Hi SAS Friends,
Because there are so many, individually saving them as a .xlsx file for import seems impractical, which is how many SAS programmers handle this.
Because Import does not deal well with data with more than a single header row to get variable names from this would be a bad idea quickly. Second, each time you invoke Proc Import the procedure makes a potentially quite different set of guesses as to variable type and length it is often difficult to combine data imported from xlsx because some variables end up with numeric and others with character for the same column. Which SAS won't allow and you can spend more time "fixing" things than you might have saved with import.
Moral of the story: If the files have the same structure read them with a data step.
@Tom showed you how to read one file. The concept can be extended because a single data step can read multiple files that match a wildcarded operating system filename. Because all the data is read as if from one sequential stream you can not rely on the FIRSTOBS= infile option. Instead you will want to examine a held input of the raw line and examine it for landmarks pertinent to subsequent statements in the data step. Also, use FILEVAR= to capture the source filename into a automatic variable that can be assigned to an output data set variable. NOTE: Automatic variables are part of the program data vector (PDV) but are always automatically dropped and thus can not be directly part of an output data set.
Example (untested):
data want(label="Input from many csv");
length source_filename $200; infile "&path.\*.csv" filevar=in_filename; retain input_now_flag 0 source_filename;
if lag (in_filename) ne in_filename then do; /* start of reading from next file */
source_filename = in_filename; input_now_flag = 0;
end; input @; /* held input for checking on landmarks */ if _infile_ =: '<>' then do; /* landmark for subsequent rows are data */ input_now_flag = 1; delete; end;
if _infile_ =: '//' then do; /* landmark for end of data, ignore anything that might be after it */
input_now_flag = 0;
delete;
end;
if not input_now_flag then delete; /* have not reached the data landmark yet */
/* presume all *.csv files have the same column structure and order */ input date :mmddyy. time :time. (point1-point8) (:nodata.);
format date yymmdd10. time time8. ;
keep source_filename date time point1-point8; run;
Hi Richard,
Thank you very much for the reply. Have learned allot from your sample code.
In running this code, I received an error message,
ERROR: A Physical file reference (i.e. "PHYSICAL FILE REFERENCE" ) or an aggregate file storage
reference (i.e. AGGREGATE(MEMBER) ) reference cannot be used with the FILEVAR= option."
But changing "FILEVAR" to "FILENAME", based on some SAS community comments, allowed the code to run.
Once running, the code recognizes each file in the directory, there is a message like this, once for each file:
NOTE: The infile "C:\Users\SAS\Box\DATA_RENO\*.csv" is:
Filename=C:\Users\SAS\Box\DATA_RENO\Temp_12-31-19_06-55.Hum.Press-DA.csv,
File List=C:\Users\SAS\Box\DATA_RENO\*.csv,
RECFM=V,LRECL=32767
Then, for each file, there is another line like this:
NOTE: 110 records were read from the infile "C:\Users\SAS\Box\DATA_RENO\*.csv".
The minimum record length was 2.
The maximum record length was 115.
NOTE: The data set WORK.WANT has 0 observations and 11 variables.
NOTE: DATA statement used (Total process time):
real time 1.12 seconds
cpu time 0.68 seconds
But it doesn't seem to aggregate the data into one file.
Should I be adding a Landmark, at this point in the code ?
if _infile_ =: '<>' then do; /* landmark for subsequent rows are data */
Have pasted the code below for ease of reference,
data want_R (label="Input from many csv");
length source_filename $200;
infile "C:\Users\SAS\Box\DATA_RENO\*.csv" filevar=in_filename;
retain input_now_flag 0 source_filename;
if lag (in_filename) ne in_filename then do; /* start of reading from next file */
source_filename = in_filename;
input_now_flag = 0;
end;
input @; /* held input for checking on landmarks */
if _infile_ =: '<>' then do; /* landmark for subsequent rows are data */
input_now_flag = 1;
delete;
end;
if _infile_ =: '//' then do; /* landmark for end of data, ignore anything that might be after it */
input_now_flag = 0;
delete;
end;
if not input_now_flag then delete; /* have not reached the data landmark yet */
/* presume all *.csv files have the same column structure and order */
input date :mmddyy. time :time. (point1-point8) (:nodata.);
format date yymmdd10. time time8. ;
keep source_filename date time point1-point8;
run;
Thanks again for all of this, am learning a great deal about importing via a datastep !
Robert
Thanks for catching the FILENAME problem in untested code. End of post has tested example with generated data.
You probably want to check the value returned by filename=, it's probably $8, and should be lengthened
length source_filename in_filename $200 ;
Why is filename= automatic variable default length of $8? Probably because standard file references (i.e. filename <fileref> "path") are identifiers (a reference name) with an 8 character limit.
Yes, lots of files makes for lots of NOTES. You can get one NOTE if you use the operating system to your advantage, however you would lose the information about which file a record came from (if that loss is OK, hey no problem)
Use OS to execute a command that concatenates all the files together, and read from that commands stdout
filename ALLCSV pipe "powershell -c ""cat &PATH.\*.csv""";
data want;
infile ALLCSV dsd missover;
...
Back to business....
The first 'landmark' identifies the header line of the data. I chose the test to be "a line starting with <>
is the header line" based on the sample data in the question.
<>Date Time Point_1 Point_2 Point_3 Point_4 Point_5 Point_6 Point_7 Point_8
The next line after the header line should be data.
If your test for the header line landmark is incorrect the program will never reach the second INPUT statement, and you will get 0 observations.
I chose//
to be the landmark for end of data, again based on the sample data. Regardless, the input_now_flag is reset whenever a new filename is encountered.
12/31/2019 7:25:11 72.25 0.05 26.81 73.62 -0.09 -0.06 No Data No Data
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
Tested example (900 data rows written and 900 read in) :
data _null_;
do i = 1 to 9;
out_filename = cats("C:\temp\",i,".csv");
file multiple filevar=out_filename;
do j = 1 to 15;
put 'header detail' j;
end;
put '<>a,b,c';
do z = 1 to 100;
a = i * 100 + z-1;
b = a**2;
c = sqrt(a);
put a ',' b ',' c;
end;
put '/////////';
put '999,999,999, i dare you to ignore me';
end;
run;
proc format ;
invalue nodata
'TIME (-3)'=.t
'No Data'=.d
other=_same_
;
run;
filename ALLCSV pipe "powershell -c ""cat c:\temp\?.csv""";
data want;
length source_filename $200 in_filename $200 ;
infile 'c:\temp\?.csv' filename=in_filename dsd missover ;
/* infile ALLCSV dsd missover;*/
retain input_now_flag 0 source_filename;
if lag (in_filename) ne in_filename then do; /* start of reading from next file */
source_filename = in_filename;
input_now_flag = 0;
end;
input @;
if _infile_ =: '<>' then do; /* landmark for subsequent rows are data */
input_now_flag = 1;
delete;
end;
if _infile_ =: '//' then do; /* landmark for end of data, ignore anything that might be after it */
input_now_flag = 0;
delete;
end;
if not input_now_flag then delete; /* have not reached the data landmark yet */
/* presume all *.csv files have the same column structure and order */
input @1 a b c; *date :mmddyy. time :time. (point1-point8) (:nodata.);
drop input_now_flag;
run;
Hi Richard,
Thanks for the reply and clarifications, I understand the code much better now.
With help from you and Tom, the data have been imported and connected and I was able to generate our first summary report today.
Which is great to have gotten done.
Many thanks for the detailed recommendations and collective patience,
Robert
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.