BookmarkSubscribeRSS Feed
wwtt
Calcite | Level 5

I have multiple CSV input files that I have been importing into a single dataset. Each report has nearly the exact same format: there's a separate header and body section, the header always has seven "variables", the body always has 15 variables, and there's always 2 blank lines separating the two sections - the only difference is that there can be a variable number of rows in the body section. See green-shaded table in the attachment.

 

When importing these files into a single dataset, I've been lopping off the headers and just keeping the bodies of each report - see blue-shaded table.

 

But I'd like to keep some of the data in each report header by inserting them as new columns in the output dataset. See pink-shaded table, where I've transposed "HeaderVar5" from a row in each input report's header into a single column in the output.

 

I'm using SAS 9.4. Here's the code I have so far:

 

PROC IMPORT OUT=TEST 
            DATAFILE= "C:\filepath\xyz.report.*" 
            DBMS=DLM REPLACE;
     DELIMITER='2C'x; 
     GETNAMES=NO;
     DATAROW=7; 
RUN;

data test; set test;
if BodyVar3 in ("",'BodyVar3') then delete;
RUN;

 

8 REPLIES 8
Tom
Super User Tom
Super User

What does the XLSX attachment have to do with the question? Where are the CSV files you mentioned?  

I don't think PROC IMPORT can handle multiple input files at once.  Did that code you posted even run?  If not then show the log.

 

If you want to read multiple CSV files at once then write you own data step.

Reeza
Super User

data import_all_data;
 
*make sure variables to store file name are long enough;
length filename txt_file_name $256;
 
*keep file name from record to record;
retain txt_file_name;
 
*Use wildcard in input;
infile "Path\*.txt" eov=eov filename=filename truncover;
 
*Input first record and hold line;
input@;
 
*Check if this is the first record or the first record in a new file;
*If it is, replace the filename with the new file name and move to next line;
if _n_ eq 1 or eov then do;
txt_file_name = scan(filename, -1, "\");
eov=0;
record_type = 'H';
end;
 
*Otherwise  go to the import step and read the files;
else do;
record_type = 'D';

 input
 
*Place input code here;
 
;
run;

Try something like the above, filling in your INPUT statement and modifying teh INFILE statement as needed. 

 

Untested because of no data. 

 


@wwtt wrote:

I have multiple CSV input files that I have been importing into a single dataset. Each report has nearly the exact same format: there's a separate header and body section, the header always has seven "variables", the body always has 15 variables, and there's always 2 blank lines separating the two sections - the only difference is that there can be a variable number of rows in the body section. See green-shaded table in the attachment.

 

When importing these files into a single dataset, I've been lopping off the headers and just keeping the bodies of each report - see blue-shaded table.

 

But I'd like to keep some of the data in each report header by inserting them as new columns in the output dataset. See pink-shaded table, where I've transposed "HeaderVar5" from a row in each input report's header into a single column in the output.

 

I'm using SAS 9.4. Here's the code I have so far:

 

PROC IMPORT OUT=TEST 
            DATAFILE= "C:\filepath\xyz.report.*" 
            DBMS=DLM REPLACE;
     DELIMITER='2C'x; 
     GETNAMES=NO;
     DATAROW=7; 
RUN;

data test; set test;
if BodyVar3 in ("",'BodyVar3') then delete;
RUN;

 


 

wwtt
Calcite | Level 5

Hi, thanks for responding, I did see that code, but I'm not sure it answers my question. I already have code (however primitive) to consolidate the CSV files, I just need to know how to get one particular field in the header section of the input file into the body of the output dataset. If you open the previous Excel attachment, the input file doesn't have a single header row, it's an entire section where the data is arranged in such a way that it has to be transposed in order to be inserted as a new column in the output dataset. Basically I need the data in cell B6 in the Excel file to be its own column variable in the output dataset (both are in red font). I've exported the green table in the previous Excel file as a CSV and attached it, if that makes it easier.

Tom
Super User Tom
Super User

Here is your example in a Insert Code box so that this forum will display it in a readable way.

HeaderVar1,ABCD,,,,,,,,,,,,,
HeaderVar2,43727,,,,,,,,,,,,,
HeaderVar3,EDFG,,,,,,,,,,,,,
HeaderVar4,2019,,,,,,,,,,,,,
HeaderVar5,MI,,,,,,,,,,,,,
HeaderVar6,1,,,,,,,,,,,,,
HeaderVar7,2,,,,,,,,,,,,,
,,,,,,,,,,,,,,
,,,,,,,,,,,,,,
BodyVar1,BodyVar2,BodyVar3,BodyVar4,BodyVar5,BodyVar6,BodyVar7,BodyVar8,BodyVar9,BodyVar10,BodyVar11,BodyVar12,BodyVar13,BodyVar14,BodyVar15
123456,12345,TypeX,1,A,CycleA,123,4566,789,0.15,1000,10000,0,09,2019
789012,12345,TypeY,2,B,CycleB,123,4566,789,0.85,2000,1000,4,08,2019

In general just read the first column's value and use that to figure out where to store the second column's value. So it looks like you want to read values from the header rows and RETAIN the values across the rest of the observations?

input name :$32 value :$100. ;
if name='HeaderVar1' then headervar1=value;
else if name='HeaderVar7' then headervar7=input(value,32.);
...
retain headervar1-headervar7;

Or do you need to use the values to conditionally drive some logic?  

Do the name of the variables in the header rows change from file to file?

Do the number of header rows change or is it always 7 header rows followed by 2 blank lines?  If not how can you detect when the header rows are finished and the tabular data is starting?

wwtt
Calcite | Level 5
An example of the CSV file is in the green-shaded table in the attached Excel file, as mentioned in my post.

The code runs fine, it outputs the blue shaded table in the attachment, as I previously mentioned. I left out the part where I renamed the variables and the proc export into .csv, but I didn't think those were relevant to my question.

Sorry I've angered you - obviously I am new here, and I followed the instructions for posting a new user's first message: include the SAS version, the code run so far, etc. Thanks for your time anyway.
Reeza
Super User
You attached an XLSX file, which isn't a CSV and Excel will change the file so you can't rely on that as an input format and neither can we. Also, we can't run your code against the file provided we'd have to first save it to a CSV ourselves. I can't download attachments(many other can't as well) so can't see what you've included anyways. You can attach images or texts directly into your posts though.

We're asking questions to try and understand your issue, no one is angry here, we're just trying to get the information to help you.

Your SAS version is not specific enough, especially for imports which change over time, you need to provide your full version, as 9.4 TS1M5. For this particular question, your SAS version is likely irrelevant.

Re-reading your question, I suspect you may just need a transpose after your import.
ballardw
Super User

Proc import just isn't going to do that very well if at all. You may have problems with variable types for the values of your header vars.

 

A data step will read headers directly with care. Example:

data example;
   infile datalines dlm=',' missover;
   informat HeaderVar1 HeaderVar2 $15. var1 $5. var2 f5.;
   informat BodyVar1 f8. BodyVar2 f8. BodyVar3 $10. BodyVar4 f1. BodyVar5 $1.;
   retain  HeaderVar1 var1 HeaderVar2 var2 ;
   retain indata 0;
   if not(indata) then do;
      input  HeaderVar1 var1
        / HeaderVar2 var2
        /
        / 
        / @;
        indata=1;
   end;
   input  BodyVar1 BodyVar2 BodyVar3 BodyVar4 BodyVar5;
drop indata; output; datalines; HeaderVar1,ABCD HeaderVar2,43727 BodyVar1,BodyVar2,BodyVar3,BodyVar4,BodyVar5 123456,12345,TypeX,1,A 789012,12345,TypeY,2,B ;

This uses a comma delimited section  of data lines to emulate the CSV file.

The principal is the same with more or fewer "header" lines.

The input statement with the / indicates continue reading on the next line for the current record. My example only includes two header rows, the blank row shown in your example and then the body variables header row. The two / without variables read the blank row and the header row (into nothing), the last / @ advances the input pointer to the start of the data. Note the check to see if we have actually read the file headers yet. The last bit after the / @ sets that check variable to True (1). The retain keeps the header variables and the indata check variable.

 

It wasn't clear exactly which "header" you actually wanted, so I kept all of them. You  could drop the unwanted ones along with the indata variable.

After you have read each of the files then append them if needed.

Note that If these files are actually extremely similar except for the header block values there are ways to read multiple files and execute the header read when the external file being read changes.

 

Have you had any issues with the values of variables not aligning with your "cut and paste" with proc import? Each file that you run proc import on makes the procedure execute a separate set of "guesses" as to the variable types and you can run into problems letting the proc guess your variable types or lengths of character variables.

 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1543 views
  • 0 likes
  • 4 in conversation