SAS Programming

wwtt · Posted 09-23-2019 10:24 AM

I have multiple CSV input files that I have been importing into a single dataset. Each report has nearly the exact same format: there's a separate header and body section, the header always has seven "variables", the body always has 15 variables, and there's always 2 blank lines separating the two sections - the only difference is that there can be a variable number of rows in the body section. See green-shaded table in the attachment.

When importing these files into a single dataset, I've been lopping off the headers and just keeping the bodies of each report - see blue-shaded table.

But I'd like to keep some of the data in each report header by inserting them as new columns in the output dataset. See pink-shaded table, where I've transposed "HeaderVar5" from a row in each input report's header into a single column in the output.

I'm using SAS 9.4. Here's the code I have so far:

PROC IMPORT OUT=TEST 
            DATAFILE= "C:\filepath\xyz.report.*" 
            DBMS=DLM REPLACE;
     DELIMITER='2C'x; 
     GETNAMES=NO;
     DATAROW=7; 
RUN;

data test; set test;
if BodyVar3 in ("",'BodyVar3') then delete;RUN;

Tom · Posted 09-23-2019 10:31 AM

What does the XLSX attachment have to do with the question? Where are the CSV files you mentioned?

I don't think PROC IMPORT can handle multiple input files at once. Did that code you posted even run? If not then show the log.

If you want to read multiple CSV files at once then write you own data step.

Reeza · Posted 09-23-2019 10:56 AM


data import_all_data;
 
*make sure variables to store file name are long enough;
length filename txt_file_name $256;
 
*keep file name from record to record;
retain txt_file_name;
 
*Use wildcard in input;
infile "Path\*.txt" eov=eov filename=filename truncover;
 
*Input first record and hold line;
input@;
 
*Check if this is the first record or the first record in a new file;
*If it is, replace the filename with the new file name and move to next line;
if _n_ eq 1 or eov then do;
txt_file_name = scan(filename, -1, "\");
eov=0;
record_type = 'H';
end;
 
*Otherwise  go to the import step and read the files;
else do;
record_type = 'D';

 input
 
*Place input code here;
 
;
run;

Try something like the above, filling in your INPUT statement and modifying teh INFILE statement as needed.

Untested because of no data.

@wwtt wrote:

I have multiple CSV input files that I have been importing into a single dataset. Each report has nearly the exact same format: there's a separate header and body section, the header always has seven "variables", the body always has 15 variables, and there's always 2 blank lines separating the two sections - the only difference is that there can be a variable number of rows in the body section. See green-shaded table in the attachment.

When importing these files into a single dataset, I've been lopping off the headers and just keeping the bodies of each report - see blue-shaded table.

But I'd like to keep some of the data in each report header by inserting them as new columns in the output dataset. See pink-shaded table, where I've transposed "HeaderVar5" from a row in each input report's header into a single column in the output.

I'm using SAS 9.4. Here's the code I have so far:
PROC IMPORT OUT=TEST 
            DATAFILE= "C:\filepath\xyz.report.*" 
            DBMS=DLM REPLACE;
     DELIMITER='2C'x; 
     GETNAMES=NO;
     DATAROW=7; 
RUN;

data test; set test;
if BodyVar3 in ("",'BodyVar3') then delete;RUN;

wwtt · Posted 09-23-2019 11:48 AM

Hi, thanks for responding, I did see that code, but I'm not sure it answers my question. I already have code (however primitive) to consolidate the CSV files, I just need to know how to get one particular field in the header section of the input file into the body of the output dataset. If you open the previous Excel attachment, the input file doesn't have a single header row, it's an entire section where the data is arranged in such a way that it has to be transposed in order to be inserted as a new column in the output dataset. Basically I need the data in cell B6 in the Excel file to be its own column variable in the output dataset (both are in red font). I've exported the green table in the previous Excel file as a CSV and attached it, if that makes it easier.

Reeza · Posted 09-23-2019 11:56 AM

Transposing data tutorials:
Long to Wide:
https://stats.idre.ucla.edu/sas/modules/how-to-reshape-data-long-to-wide-using-proc-transpose/

https://stats.idre.ucla.edu/sas/modules/reshaping-data-long-to-wide-using-the-data-step/

Wide to Long:
https://stats.idre.ucla.edu/sas/modules/how-to-reshape-data-wide-to-long-using-proc-transpose/

https://stats.idre.ucla.edu/sas/modules/reshaping-data-wide-to-long-using-a-data-step/

And sometimes a double transpose is needed for extra wide data sets:
https://gist.github.com/statgeek/2321b6f62ab78d5bf2b0a5a8626bd7cd

Tom · Posted 09-23-2019 12:42 PM

Here is your example in a Insert Code box so that this forum will display it in a readable way.

HeaderVar1,ABCD,,,,,,,,,,,,,
HeaderVar2,43727,,,,,,,,,,,,,
HeaderVar3,EDFG,,,,,,,,,,,,,
HeaderVar4,2019,,,,,,,,,,,,,
HeaderVar5,MI,,,,,,,,,,,,,
HeaderVar6,1,,,,,,,,,,,,,
HeaderVar7,2,,,,,,,,,,,,,
,,,,,,,,,,,,,,
,,,,,,,,,,,,,,
BodyVar1,BodyVar2,BodyVar3,BodyVar4,BodyVar5,BodyVar6,BodyVar7,BodyVar8,BodyVar9,BodyVar10,BodyVar11,BodyVar12,BodyVar13,BodyVar14,BodyVar15
123456,12345,TypeX,1,A,CycleA,123,4566,789,0.15,1000,10000,0,09,2019
789012,12345,TypeY,2,B,CycleB,123,4566,789,0.85,2000,1000,4,08,2019

In general just read the first column's value and use that to figure out where to store the second column's value. So it looks like you want to read values from the header rows and RETAIN the values across the rest of the observations?

input name :$32 value :$100. ;
if name='HeaderVar1' then headervar1=value;
else if name='HeaderVar7' then headervar7=input(value,32.);
...
retain headervar1-headervar7;

Or do you need to use the values to conditionally drive some logic?

Do the name of the variables in the header rows change from file to file?

Do the number of header rows change or is it always 7 header rows followed by 2 blank lines? If not how can you detect when the header rows are finished and the tabular data is starting?

wwtt · Posted 09-23-2019 11:33 AM

An example of the CSV file is in the green-shaded table in the attached Excel file, as mentioned in my post.

The code runs fine, it outputs the blue shaded table in the attachment, as I previously mentioned. I left out the part where I renamed the variables and the proc export into .csv, but I didn't think those were relevant to my question.

Sorry I've angered you - obviously I am new here, and I followed the instructions for posting a new user's first message: include the SAS version, the code run so far, etc. Thanks for your time anyway.

Reeza · Posted 09-23-2019 11:54 AM

You attached an XLSX file, which isn't a CSV and Excel will change the file so you can't rely on that as an input format and neither can we. Also, we can't run your code against the file provided we'd have to first save it to a CSV ourselves. I can't download attachments(many other can't as well) so can't see what you've included anyways. You can attach images or texts directly into your posts though.

We're asking questions to try and understand your issue, no one is angry here, we're just trying to get the information to help you.

Your SAS version is not specific enough, especially for imports which change over time, you need to provide your full version, as 9.4 TS1M5. For this particular question, your SAS version is likely irrelevant.

Re-reading your question, I suspect you may just need a transpose after your import.

ballardw · Posted 09-23-2019 03:50 PM

Proc import just isn't going to do that very well if at all. You may have problems with variable types for the values of your header vars.

A data step will read headers directly with care. Example:

data example;
   infile datalines dlm=',' missover;
   informat HeaderVar1 HeaderVar2 $15. var1 $5. var2 f5.;
   informat BodyVar1 f8. BodyVar2 f8. BodyVar3 $10. BodyVar4 f1. BodyVar5 $1.;
   retain  HeaderVar1 var1 HeaderVar2 var2 ;
   retain indata 0;
   if not(indata) then do;
      input  HeaderVar1 var1
        / HeaderVar2 var2
        /
        / 
        / @;
        indata=1;
   end;
   input  BodyVar1 BodyVar2 BodyVar3 BodyVar4 BodyVar5;
   drop indata;
   output;

datalines;
HeaderVar1,ABCD			
HeaderVar2,43727			
				
BodyVar1,BodyVar2,BodyVar3,BodyVar4,BodyVar5
123456,12345,TypeX,1,A
789012,12345,TypeY,2,B
;

This uses a comma delimited section of data lines to emulate the CSV file.

The principal is the same with more or fewer "header" lines.

The input statement with the / indicates continue reading on the next line for the current record. My example only includes two header rows, the blank row shown in your example and then the body variables header row. The two / without variables read the blank row and the header row (into nothing), the last / @ advances the input pointer to the start of the data. Note the check to see if we have actually read the file headers yet. The last bit after the / @ sets that check variable to True (1). The retain keeps the header variables and the indata check variable.

It wasn't clear exactly which "header" you actually wanted, so I kept all of them. You could drop the unwanted ones along with the indata variable.

After you have read each of the files then append them if needed.

Note that If these files are actually extremely similar except for the header block values there are ways to read multiple files and execute the header read when the external file being read changes.

Have you had any issues with the values of variables not aligning with your "cut and paste" with proc import? Each file that you run proc import on makes the procedure execute a separate set of "guesses" as to the variable types and you can run into problems letting the proc guess your variable types or lengths of character variables.

SAS Programming

Import multiple text files BUT keep certain fields in report header

Re: Import multiple text files BUT keep certain fields in report header

Re: Import multiple text files BUT keep certain fields in report header

Re: Import multiple text files BUT keep certain fields in report header

Re: Import multiple text files BUT keep certain fields in report header

Re: Import multiple text files BUT keep certain fields in report header

Re: Import multiple text files BUT keep certain fields in report header

Re: Import multiple text files BUT keep certain fields in report header

Re: Import multiple text files BUT keep certain fields in report header

Tricks for SAS Visual Analytics Report Builders: Step 3 - Focus on Wha...

Importing multiple CSV files

Importing multiple Excel files

How to import multiple file which has multiple format

How do I import a text file in which its field values are wrapped in p...

Follow Us

What is...

SAS Programming

Special offer for SAS Communities members

SAS Training: Just a Click Away

Follow Us

What is...