Hello All, how can I load this type text file(see image of part of the file below) in sas without the highlighed rows, they occure multiple times in the file. And there are hundreds of these type of file, I need to load them in to one dataset. Thanks much!!!
I also attached the actual text file below:
Try the following:
filename mydata "c:\art\test\*.txt";
data want (drop=dept_in);
infile mydata lrecl=131 pad truncover eov=newfile;
informat patient dept dept_in product $char10.;
informat charge_description $24.;
informat ordering_md type $12.;
informat type $2.;
informat day anydtdte8.;
format day date9.;
informat class $char16.;
informat ip_volume charge_quantity 4.;
format ip_volume charge_quantity 4.1;
informat actual_cost total_charge comma9.;
format actual_cost total_charge comma9.2;
retain patient dept;
input @;
if newfile then do;
counter=1;
newfile=0;
end;
else counter+1;
if counter eq 3 then do;
input @ "ENCOUNTER " patient;
end;
else if input(substr(_infile_,123),?? comma9.) and
substr(_infile_,10,15) ne "CHARGE QUANTITY" then do;
input
dept_in 2-11
@13 product &
@24 charge_description &
@49 ordering_md &
type 64-65
day 67-75
class 76-92
@94 ip_volume
@103 charge_quantity
@112 actual_cost
@123 total_charge
;
if not missing(strip(dept_in)) then
dept=dept_in;
output;
end;
else input;
run;
data want;
infile tfile truncover;
input @1 RecType:$3.;
if RecType = 'DCRPT' then do;
i=0;
do i = 1 to 11;
input;
end;
that gets rid of the page header. You would then need to start the next input @1.
Then use proc append to merge all of the datasets.
If RECTYPE is $3 how can it be equal to 'DCRPT'? :smileyshocked:
I think reading all the files as one input stream using wildcard fileref would be much more efficient than hundreds of PROC APPENDs.
good catch...
You might could use the OS to concatenate prior to the data step.
One way would be to exclude even more than you highlighted. If all of your data records are similar to the ones shown in your example, the actual data lines are longer than all of the header records, with the exception of the page number record.
And, what appears to distinguish the data records from the page number header record, is that the last column of your data records (which I presume is between two specific columns in your actual file) are the only records that contain just a number between those columns.
As such, I'd write some code that read in the file, but only kept those records which had a valid number in the range of columns where actual cost is recorded. Then, I'd use a datastep to read in the revised file, declaring the variable names in an input statement and declaring all of the desired formats and informats with format and informat statements.
here is the look of all the columns. the header occours mutilple times in the file. and there are hundreds files like this so it is not realistic to manual revise all those files. thanks!!
I don't think anyone (that I noticed at least) was suggesting anything manual. Can you post the notepad file itself rather than the screen shot of it?
yes. here is the file. thanks much!!
Do you want to exclude the totals at the bottom as well?
Should the DEPT value be considerd as having been entered on the blank lines below each value? For example do you want DEPT = ICU/CCU-TR for the PRODUCT value of CRRT/SLED on the second row of the detail lines?
The date column in the body of the data looks like it is incrementing a day of period 1 through 6 with a date. Should only the record that appears with those values have them, all records until the next read value have the same day number and date or exclude entirely?
yes. i want to exclude the totals at the bottom. Thanks much!
If you can, post enough example data to represent most all the relevant issues.
Instead of worrying about how to get rid of the parts you don't want I would flip that over and concentrate on how to read the parts you do want.
Other than changing the location of your files in the filename statement the following should come awfully close. I made two copies of your test data and put them in a directory labeled "c:\art\test". The following code read in both files and parsed the records for the desired data. It skips the last records of the file, which appear to be summary records:
filename mydata "c:\art\test\*.txt";
data want;
infile mydata lrecl=131 pad truncover;
informat dept product $char10.;
informat charge_description $24.;
informat ordering_md type $12.;
informat type $2.;
informat day anydtdte8.;
format day date9.;
informat class $char16.;
informat ip_volume charge_quantity 4.;
format ip_volume charge_quantity 4.1;
informat actual_cost total_charge comma9.;
format actual_cost total_charge comma9.2;
input @;
if input(substr(_infile_,123),?? comma9.) and
substr(_infile_,10,15) ne "CHARGE QUANTITY" then do;
input
dept 2-11
@13 product &
@24 charge_description &
@49 ordering_md &
type 64-65
day 67-75
class 76-92
@94 ip_volume
@103 charge_quantity
@112 actual_cost
@123 total_charge
;
output;
end;
else input;
run;
Great! it runs just perfectly!! thank you so much Arthur and everyone!
I just have one more question: the first column "Dept" is not populated for every single record. how can I fill the empty field with the value of the record above until the new value and so on.
And lastly add a column for patient ID. Patient ID stays the same within each file since one file is for one patient.
sorry I have some many questions. I really appreciate your help!! many thanks again!!!
Try the following:
filename mydata "c:\art\test\*.txt";
data want (drop=dept_in);
infile mydata lrecl=131 pad truncover eov=newfile;
informat patient dept dept_in product $char10.;
informat charge_description $24.;
informat ordering_md type $12.;
informat type $2.;
informat day anydtdte8.;
format day date9.;
informat class $char16.;
informat ip_volume charge_quantity 4.;
format ip_volume charge_quantity 4.1;
informat actual_cost total_charge comma9.;
format actual_cost total_charge comma9.2;
retain patient dept;
input @;
if newfile then do;
counter=1;
newfile=0;
end;
else counter+1;
if counter eq 3 then do;
input @ "ENCOUNTER " patient;
end;
else if input(substr(_infile_,123),?? comma9.) and
substr(_infile_,10,15) ne "CHARGE QUANTITY" then do;
input
dept_in 2-11
@13 product &
@24 charge_description &
@49 ordering_md &
type 64-65
day 67-75
class 76-92
@94 ip_volume
@103 charge_quantity
@112 actual_cost
@123 total_charge
;
if not missing(strip(dept_in)) then
dept=dept_in;
output;
end;
else input;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.