- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I have multiple CSV input files that I have been importing into a single dataset. Each report has nearly the exact same format: there's a separate header and body section, the header always has seven "variables", the body always has 15 variables, and there's always 2 blank lines separating the two sections - the only difference is that there can be a variable number of rows in the body section. See green-shaded table in the attachment.
When importing these files into a single dataset, I've been lopping off the headers and just keeping the bodies of each report - see blue-shaded table.
But I'd like to keep some of the data in each report header by inserting them as new columns in the output dataset. See pink-shaded table, where I've transposed "HeaderVar5" from a row in each input report's header into a single column in the output.
I'm using SAS 9.4. Here's the code I have so far:
PROC IMPORT OUT=TEST
DATAFILE= "C:\filepath\xyz.report.*"
DBMS=DLM REPLACE;
DELIMITER='2C'x;
GETNAMES=NO;
DATAROW=7;
RUN;
data test; set test;
if BodyVar3 in ("",'BodyVar3') then delete;RUN;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
What does the XLSX attachment have to do with the question? Where are the CSV files you mentioned?
I don't think PROC IMPORT can handle multiple input files at once. Did that code you posted even run? If not then show the log.
If you want to read multiple CSV files at once then write you own data step.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data import_all_data;
*make sure variables to store file name are long enough;
length filename txt_file_name $256;
*keep file name from record to record;
retain txt_file_name;
*Use wildcard in input;
infile "Path\*.txt" eov=eov filename=filename truncover;
*Input first record and hold line;
input@;
*Check if this is the first record or the first record in a new file;
*If it is, replace the filename with the new file name and move to next line;
if _n_ eq 1 or eov then do;
txt_file_name = scan(filename, -1, "\");
eov=0;
record_type = 'H';
end;
*Otherwise go to the import step and read the files;
else do;
record_type = 'D';
input
*Place input code here;
;
run;
Try something like the above, filling in your INPUT statement and modifying teh INFILE statement as needed.
Untested because of no data.
@wwtt wrote:
I have multiple CSV input files that I have been importing into a single dataset. Each report has nearly the exact same format: there's a separate header and body section, the header always has seven "variables", the body always has 15 variables, and there's always 2 blank lines separating the two sections - the only difference is that there can be a variable number of rows in the body section. See green-shaded table in the attachment.
When importing these files into a single dataset, I've been lopping off the headers and just keeping the bodies of each report - see blue-shaded table.
But I'd like to keep some of the data in each report header by inserting them as new columns in the output dataset. See pink-shaded table, where I've transposed "HeaderVar5" from a row in each input report's header into a single column in the output.
I'm using SAS 9.4. Here's the code I have so far:
PROC IMPORT OUT=TEST DATAFILE= "C:\filepath\xyz.report.*" DBMS=DLM REPLACE; DELIMITER='2C'x; GETNAMES=NO; DATAROW=7; RUN; data test; set test; if BodyVar3 in ("",'BodyVar3') then delete;RUN;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi, thanks for responding, I did see that code, but I'm not sure it answers my question. I already have code (however primitive) to consolidate the CSV files, I just need to know how to get one particular field in the header section of the input file into the body of the output dataset. If you open the previous Excel attachment, the input file doesn't have a single header row, it's an entire section where the data is arranged in such a way that it has to be transposed in order to be inserted as a new column in the output dataset. Basically I need the data in cell B6 in the Excel file to be its own column variable in the output dataset (both are in red font). I've exported the green table in the previous Excel file as a CSV and attached it, if that makes it easier.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Transposing data tutorials:
Long to Wide:
https://stats.idre.ucla.edu/sas/modules/how-to-reshape-data-long-to-wide-using-proc-transpose/
https://stats.idre.ucla.edu/sas/modules/reshaping-data-long-to-wide-using-the-data-step/
Wide to Long:
https://stats.idre.ucla.edu/sas/modules/how-to-reshape-data-wide-to-long-using-proc-transpose/
https://stats.idre.ucla.edu/sas/modules/reshaping-data-wide-to-long-using-a-data-step/
And sometimes a double transpose is needed for extra wide data sets:
https://gist.github.com/statgeek/2321b6f62ab78d5bf2b0a5a8626bd7cd
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Here is your example in a Insert Code box so that this forum will display it in a readable way.
HeaderVar1,ABCD,,,,,,,,,,,,, HeaderVar2,43727,,,,,,,,,,,,, HeaderVar3,EDFG,,,,,,,,,,,,, HeaderVar4,2019,,,,,,,,,,,,, HeaderVar5,MI,,,,,,,,,,,,, HeaderVar6,1,,,,,,,,,,,,, HeaderVar7,2,,,,,,,,,,,,, ,,,,,,,,,,,,,, ,,,,,,,,,,,,,, BodyVar1,BodyVar2,BodyVar3,BodyVar4,BodyVar5,BodyVar6,BodyVar7,BodyVar8,BodyVar9,BodyVar10,BodyVar11,BodyVar12,BodyVar13,BodyVar14,BodyVar15 123456,12345,TypeX,1,A,CycleA,123,4566,789,0.15,1000,10000,0,09,2019 789012,12345,TypeY,2,B,CycleB,123,4566,789,0.85,2000,1000,4,08,2019
In general just read the first column's value and use that to figure out where to store the second column's value. So it looks like you want to read values from the header rows and RETAIN the values across the rest of the observations?
input name :$32 value :$100. ;
if name='HeaderVar1' then headervar1=value;
else if name='HeaderVar7' then headervar7=input(value,32.);
...
retain headervar1-headervar7;
Or do you need to use the values to conditionally drive some logic?
Do the name of the variables in the header rows change from file to file?
Do the number of header rows change or is it always 7 header rows followed by 2 blank lines? If not how can you detect when the header rows are finished and the tabular data is starting?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The code runs fine, it outputs the blue shaded table in the attachment, as I previously mentioned. I left out the part where I renamed the variables and the proc export into .csv, but I didn't think those were relevant to my question.
Sorry I've angered you - obviously I am new here, and I followed the instructions for posting a new user's first message: include the SAS version, the code run so far, etc. Thanks for your time anyway.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
We're asking questions to try and understand your issue, no one is angry here, we're just trying to get the information to help you.
Your SAS version is not specific enough, especially for imports which change over time, you need to provide your full version, as 9.4 TS1M5. For this particular question, your SAS version is likely irrelevant.
Re-reading your question, I suspect you may just need a transpose after your import.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Proc import just isn't going to do that very well if at all. You may have problems with variable types for the values of your header vars.
A data step will read headers directly with care. Example:
data example; infile datalines dlm=',' missover; informat HeaderVar1 HeaderVar2 $15. var1 $5. var2 f5.; informat BodyVar1 f8. BodyVar2 f8. BodyVar3 $10. BodyVar4 f1. BodyVar5 $1.; retain HeaderVar1 var1 HeaderVar2 var2 ; retain indata 0; if not(indata) then do; input HeaderVar1 var1 / HeaderVar2 var2 / / / @; indata=1; end; input BodyVar1 BodyVar2 BodyVar3 BodyVar4 BodyVar5;
drop indata; output; datalines; HeaderVar1,ABCD HeaderVar2,43727 BodyVar1,BodyVar2,BodyVar3,BodyVar4,BodyVar5 123456,12345,TypeX,1,A 789012,12345,TypeY,2,B ;
This uses a comma delimited section of data lines to emulate the CSV file.
The principal is the same with more or fewer "header" lines.
The input statement with the / indicates continue reading on the next line for the current record. My example only includes two header rows, the blank row shown in your example and then the body variables header row. The two / without variables read the blank row and the header row (into nothing), the last / @ advances the input pointer to the start of the data. Note the check to see if we have actually read the file headers yet. The last bit after the / @ sets that check variable to True (1). The retain keeps the header variables and the indata check variable.
It wasn't clear exactly which "header" you actually wanted, so I kept all of them. You could drop the unwanted ones along with the indata variable.
After you have read each of the files then append them if needed.
Note that If these files are actually extremely similar except for the header block values there are ways to read multiple files and execute the header read when the external file being read changes.
Have you had any issues with the values of variables not aligning with your "cut and paste" with proc import? Each file that you run proc import on makes the procedure execute a separate set of "guesses" as to the variable types and you can run into problems letting the proc guess your variable types or lengths of character variables.