BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
linlin87
Quartz | Level 8

Hi SAS Community!

I have managed to use a for loop to import csv files and combine them, but for each csv file there is information in the header that I want to assign as a column for that csv file. For example, I have the following csv file:

 

Obs	Date	Time	Growth
System = 2021323 Batch=294 Temp=52 User=012			
%£ KIISGL-20JF FGF2202XA 20J			
1	06/06/2021	12:31:00	34.27327459
2	07/06/2021	12:26:00	35.17687849
3	08/06/2021	12:41:09	62.45345072
4	09/06/2021	12:56:08	29.32030264
5	10/06/2021	13:11:04	26.51833713
6	12/06/2021	13:26:08	34.83632567
7	13/06/2021	13:21:10	13.85211024
8	14/06/2021	13:37:30	42.6087162
8	15/06/2021	13:30:59	66.03200409
9	16/06/2021	13:10:49	95.95908292
10	17/06/2021	13:38:29	91.10805639
11	18/06/2021	13:42:10	55.87138083

There are a number of pieces of information in the header I have been ignoring, but I want to grab. I want the info in the second and third row, particularly the information highlighted in row 3: %£ KIISGL-20JF FGF2202XA 20J.

So far I have been using infile to import the csv's with firstobs=4, which gets me the data but obviously misses this extra needed information. How do I get the header information (possibly as a separate data step) so that I can add these information to my data files?

I was hoping to assign these bits of the headers as macro variables so I can use them later in my code.

 

Any help on this would be amazing!

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Why would you use PROC IMPORT to read a text file with only four variables?

What you posted is not a CSV file.  It appears to have TAB characters separating the values, not COMMA that is used a Comma Separated Values file.  Does the original file use commas or tabs? Or some other delimiter character?  Is it possible you have accidentally opened the CSV file with EXCEL and are showing the result of EXCEL reading the CSV file?

 

Anyway assuming have a CSV file you could read that into a dataset with code like:

data want;
  infile cards dsd truncover firstobs=3 ;
  if _n_=1 then do ; 
     input row1 :$50.;
     code = scan(row1,3,' ');
     retain row1 code;
  end;
  input obs date :ddmmyy. time :time. growth;
  format date yymmdd10. time tod8. ;
cards4;
Obs,Date,Time,Growth
System = 2021323 Batch=294 Temp=52 User=012,,,
%£ KIISGL-20JF FGF2202XA 20J,,,
1,06/06/2021,12:31:00,34.27327459
2,07/06/2021,12:26:00,35.17687849
3,08/06/2021,12:41:09,62.45345072
4,09/06/2021,12:56:08,29.32030264
5,10/06/2021,13:11:04,26.51833713
6,12/06/2021,13:26:08,34.83632567
7,13/06/2021,13:21:10,13.85211024
8,14/06/2021,13:37:30,42.6087162
8,15/06/2021,13:30:59,66.03200409
9,16/06/2021,13:10:49,95.95908292
10,17/06/2021,13:38:29,91.10805639
11,18/06/2021,13:42:10,55.87138083
;;;;

Tom_0-1656680049136.png

And if you have multiple CSV files in the same structure you could read them all into a single SAS dataset using a single data step also.

 

View solution in original post

7 REPLIES 7
Kurt_Bremser
Super User

You can do it in one step:

data want;
infile  "have" firstobs=2,
if _n_ = 1 /* second row of infile */
then do;
  call symputx("mvar1",_infile_);
end;
else if _n_ = 2 /* third row */
then do;
  call symputx("mvar2",_infile_);
end;
else do;
  input obs date :ddmmyy10. time :time8. growth;
  output;
end;
format date yymmdd10. time time8.;
run;
linlin87
Quartz | Level 8
Thanks Kurt. Your code makes sense, but it doesn't work.
ballardw
Super User

With out seeing exactly how you want to use that stuff I would suggest not worrying about macro variables.

 

Here is an example of reading that information into variables in the data set. Obviously your INFILE would point to your data with appropriate options. You did not describe exactly which pieces you might want.

This parses you header information and when it finds the line starting with System, the =: if you haven't seen that before is "begins with" comparison, reads the first line into variables , then reads the second line into one variable. You could use a different input statement to read that second bit differently. Other than the first 2 rows the data are read as typical variables. The Output means that records are only written to the output data set after the header rows are processed.

I retain the header information so that all of the records have that information.

The @ on the input statement keeps the input pointer on the row. _INFILE_ is the contents of that current line (up to 32K characters) so you can test it for parsing or other actions as desired. You use an Input; to advance to the next line of the file.

Note that your line with System allows use of one of the uncommon forms of input with the =. There are some limits on this if your other data has values with spaces imbedded.

You would use Firstobs=2 to start reading at the second line. OR as an exercise left to the interested read add a test for the start of the OBS (assuming that is part of the header) and just don't read that. With that change you could use an Infile similar to : Infile "c:\path\file*.csv" (or what ever common name element and read multiple files in one pass.

 

Assumptions: the System always comes before the other header which always follows on the following line.

 

data example;
   infile datalines line=inline;
   informat system $10. batch 5. temp 2. User $4.   header2 $40.
           Obs f4.	Date ddmmyy10.	Time time8.	Growth 12.
           ;
   format date ddmmyy10. time time.;
   retain system batch temp user header2;
   input @;
   if _infile_ =: 'System' then do;
     input system= Batch= Temp= user=;
     input @;
     header2=_infile_;
     input;
   end;
   else do;
      input obs date time growth;
      output;
   end;

datalines;
System = 2021323 Batch=294 Temp=52 User=012			
%£ KIISGL-20JF FGF2202XA 20J			
1	06/06/2021	12:31:00	34.27327459
2	07/06/2021	12:26:00	35.17687849
3	08/06/2021	12:41:09	62.45345072
4	09/06/2021	12:56:08	29.32030264
5	10/06/2021	13:11:04	26.51833713
6	12/06/2021	13:26:08	34.83632567
7	13/06/2021	13:21:10	13.85211024
;

@linlin87 wrote:

Hi SAS Community!

I have managed to use a for loop to import csv files and combine them, but for each csv file there is information in the header that I want to assign as a column for that csv file. For example, I have the following csv file:

 

Obs	Date	Time	Growth
System = 2021323 Batch=294 Temp=52 User=012			
%£ KIISGL-20JF FGF2202XA 20J			
1	06/06/2021	12:31:00	34.27327459
2	07/06/2021	12:26:00	35.17687849
3	08/06/2021	12:41:09	62.45345072
4	09/06/2021	12:56:08	29.32030264
5	10/06/2021	13:11:04	26.51833713
6	12/06/2021	13:26:08	34.83632567
7	13/06/2021	13:21:10	13.85211024
8	14/06/2021	13:37:30	42.6087162
8	15/06/2021	13:30:59	66.03200409
9	16/06/2021	13:10:49	95.95908292
10	17/06/2021	13:38:29	91.10805639
11	18/06/2021	13:42:10	55.87138083

There are a number of pieces of information in the header I have been ignoring, but I want to grab. I want the info in the second and third row, particularly the information highlighted in row 3: %£ KIISGL-20JF FGF2202XA 20J.

So far I have been using infile to import the csv's with firstobs=4, which gets me the data but obviously misses this extra needed information. How do I get the header information (possibly as a separate data step) so that I can add these information to my data files?

I was hoping to assign these bits of the headers as macro variables so I can use them later in my code.

 

Any help on this would be amazing!


 

Tom
Super User Tom
Super User

Why would you use PROC IMPORT to read a text file with only four variables?

What you posted is not a CSV file.  It appears to have TAB characters separating the values, not COMMA that is used a Comma Separated Values file.  Does the original file use commas or tabs? Or some other delimiter character?  Is it possible you have accidentally opened the CSV file with EXCEL and are showing the result of EXCEL reading the CSV file?

 

Anyway assuming have a CSV file you could read that into a dataset with code like:

data want;
  infile cards dsd truncover firstobs=3 ;
  if _n_=1 then do ; 
     input row1 :$50.;
     code = scan(row1,3,' ');
     retain row1 code;
  end;
  input obs date :ddmmyy. time :time. growth;
  format date yymmdd10. time tod8. ;
cards4;
Obs,Date,Time,Growth
System = 2021323 Batch=294 Temp=52 User=012,,,
%£ KIISGL-20JF FGF2202XA 20J,,,
1,06/06/2021,12:31:00,34.27327459
2,07/06/2021,12:26:00,35.17687849
3,08/06/2021,12:41:09,62.45345072
4,09/06/2021,12:56:08,29.32030264
5,10/06/2021,13:11:04,26.51833713
6,12/06/2021,13:26:08,34.83632567
7,13/06/2021,13:21:10,13.85211024
8,14/06/2021,13:37:30,42.6087162
8,15/06/2021,13:30:59,66.03200409
9,16/06/2021,13:10:49,95.95908292
10,17/06/2021,13:38:29,91.10805639
11,18/06/2021,13:42:10,55.87138083
;;;;

Tom_0-1656680049136.png

And if you have multiple CSV files in the same structure you could read them all into a single SAS dataset using a single data step also.

 

linlin87
Quartz | Level 8

Hi Tom,

Yes I opened it in excel and that is why it is showing as tab delimited.

I now have the following code for getting the directories of each .csv file:

filename tmp pipe 'dir "Z:\JLB_Study\Data\*hamrit.csv" /b /s';
data direct;
infile tmp dlm="¬";
length cmd_line $200;
input cmd_line;
run;

My question is what is this "¬" as a delimiter? I inherited this code, and I can't find any documentation on this. I know it is the logical symbol for NOT but I don't know why this would be used for a delimiter for csv files. I assumed it would instead by ",".

Any help on what this is would be great!

Tom
Super User Tom
Super User

Probably a trick to try to read the full line.  So if that strange character never appears on the line then the INPUT statement will read the whole line (or at least the first 200 bytes) in the variable CMD_LINE.

 

Probably would be easier to use the TRUNCOVER option instead.

filename tmp pipe 'dir "Z:\JLB_Study\Data\*hamrit.csv" /b /s';
data direct;
  infile tmp truncover;
  input cmd_line $200.;
run;

You can also eliminate the FILENAME statement.

data direct;
  infile 'dir "Z:\JLB_Study\Data\*hamrit.csv" /b /s' pipe truncover;
  input cmd_line $200.;
run;

And if the goal is to just read all of the CSV files you can skip making the list.

data want;
  infile "Z:\JLB_Study\Data\*hamrit.csv" dsd filename=fname ;
  input @;
  file = scan(fname,-1,'/\');
  if file ne lag(file) then do;
     input // row1 :$50.;
     code = scan(row1,3,' ');
     retain row1 code;
  end;
  input obs date :ddmmyy. time :time. growth;
  format date yymmdd10. time tod8. ;
run;
linlin87
Quartz | Level 8
Thanks for that Tom. It is interesting and I'll consider that for other applications, but I actually need to make data direct because in each folder, I also need to grab some other .csv files with slightly different suffixes (not *hamrit.csv). If you had a better way of doing this I'd be keen to hear!

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 1233 views
  • 1 like
  • 4 in conversation