Solved: Re: how can I import this massy Text file into SAS

Sherri · Posted 02-08-2013 12:45 PM

Hello All, how can I load this type text file(see image of part of the file below) in sas without the highlighed rows, they occure multiple times in the file. And there are hundreds of these type of file, I need to load them in to one dataset. Thanks much!!!

I also attached the actual text file below:

art297 · Posted 02-08-2013 08:46 PM

Try the following:

filename mydata "c:\art\test\*.txt";

data want (drop=dept_in);

infile mydata lrecl=131 pad truncover eov=newfile;

informat patient dept dept_in product $char10.;

informat charge_description $24.;

informat ordering_md type $12.;

informat type $2.;

informat day anydtdte8.;

format day date9.;

informat class $char16.;

informat ip_volume charge_quantity 4.;

format ip_volume charge_quantity 4.1;

informat actual_cost total_charge comma9.;

format actual_cost total_charge comma9.2;

retain patient dept;

input @;

if newfile then do;

counter=1;

newfile=0;

end;

else counter+1;

if counter eq 3 then do;

input @ "ENCOUNTER " patient;

end;

else if input(substr(_infile_,123),?? comma9.) and

substr(_infile_,10,15) ne "CHARGE QUANTITY" then do;

input

dept_in 2-11

@13 product &

@24 charge_description &

@49 ordering_md &

type 64-65

day 67-75

class 76-92

@94 ip_volume

@103 charge_quantity

@112 actual_cost

@123 total_charge

;

if not missing(strip(dept_in)) then

dept=dept_in;

output;

end;

else input;

run;

View solution in original post

DBailey · Posted 02-08-2013 01:15 PM

data want;

infile tfile truncover;

input @1 RecType:$3.;

if RecType = 'DCRPT' then do;

i=0;

do i = 1 to 11;

input;

end;

that gets rid of the page header. You would then need to start the next input @1.

Then use proc append to merge all of the datasets.

data_null__ · Posted 02-08-2013 01:37 PM

If RECTYPE is $3 how can it be equal to 'DCRPT'? :smileyshocked:

I think reading all the files as one input stream using wildcard fileref would be much more efficient than hundreds of PROC APPENDs.

DBailey · Posted 02-08-2013 01:58 PM

good catch...

DBailey · Posted 02-08-2013 01:59 PM

You might could use the OS to concatenate prior to the data step.

art297 · Posted 02-08-2013 01:24 PM

One way would be to exclude even more than you highlighted. If all of your data records are similar to the ones shown in your example, the actual data lines are longer than all of the header records, with the exception of the page number record.

And, what appears to distinguish the data records from the page number header record, is that the last column of your data records (which I presume is between two specific columns in your actual file) are the only records that contain just a number between those columns.

As such, I'd write some code that read in the file, but only kept those records which had a valid number in the range of columns where actual cost is recorded. Then, I'd use a datastep to read in the revised file, declaring the variable names in an input statement and declaring all of the desired formats and informats with format and informat statements.

Sherri · Posted 02-08-2013 02:21 PM

here is the look of all the columns. the header occours mutilple times in the file. and there are hundreds files like this so it is not realistic to manual revise all those files. thanks!!

art297 · Posted 02-08-2013 02:43 PM

I don't think anyone (that I noticed at least) was suggesting anything manual. Can you post the notepad file itself rather than the screen shot of it?

Sherri · Posted 02-08-2013 02:49 PM

yes. here is the file. thanks much!!

ballardw · Posted 02-08-2013 05:04 PM

Do you want to exclude the totals at the bottom as well?

Should the DEPT value be considerd as having been entered on the blank lines below each value? For example do you want DEPT = ICU/CCU-TR for the PRODUCT value of CRRT/SLED on the second row of the detail lines?

The date column in the body of the data looks like it is incrementing a day of period 1 through 6 with a date. Should only the record that appears with those values have them, all records until the next read value have the same day number and date or exclude entirely?

Sherri · Posted 02-08-2013 05:11 PM

yes. i want to exclude the totals at the bottom. Thanks much!

data_null__ · Posted 02-08-2013 02:50 PM

If you can, post enough example data to represent most all the relevant issues.

Instead of worrying about how to get rid of the parts you don't want I would flip that over and concentrate on how to read the parts you do want.

art297 · Posted 02-08-2013 06:59 PM

Other than changing the location of your files in the filename statement the following should come awfully close. I made two copies of your test data and put them in a directory labeled "c:\art\test". The following code read in both files and parsed the records for the desired data. It skips the last records of the file, which appear to be summary records:

filename mydata "c:\art\test\*.txt";

data want;

infile mydata lrecl=131 pad truncover;

informat dept product $char10.;

informat charge_description $24.;

informat ordering_md type $12.;

informat type $2.;

informat day anydtdte8.;

format day date9.;

informat class $char16.;

informat ip_volume charge_quantity 4.;

format ip_volume charge_quantity 4.1;

informat actual_cost total_charge comma9.;

format actual_cost total_charge comma9.2;

input @;

if input(substr(_infile_,123),?? comma9.) and

substr(_infile_,10,15) ne "CHARGE QUANTITY" then do;

input

dept 2-11

@13 product &

@24 charge_description &

@49 ordering_md &

type 64-65

day 67-75

class 76-92

@94 ip_volume

@103 charge_quantity

@112 actual_cost

@123 total_charge

;

output;

end;

else input;

run;

Sherri · Posted 02-08-2013 08:13 PM

Great! it runs just perfectly!! thank you so much Arthur and everyone!

I just have one more question: the first column "Dept" is not populated for every single record. how can I fill the empty field with the value of the record above until the new value and so on.

And lastly add a column for patient ID. Patient ID stays the same within each file since one file is for one patient.

sorry I have some many questions. I really appreciate your help!! many thanks again!!!

art297 · Posted 02-08-2013 08:46 PM

Try the following:

filename mydata "c:\art\test\*.txt";

data want (drop=dept_in);

infile mydata lrecl=131 pad truncover eov=newfile;

informat patient dept dept_in product $char10.;

informat charge_description $24.;

informat ordering_md type $12.;

informat type $2.;

informat day anydtdte8.;

format day date9.;

informat class $char16.;

informat ip_volume charge_quantity 4.;

format ip_volume charge_quantity 4.1;

informat actual_cost total_charge comma9.;

format actual_cost total_charge comma9.2;

retain patient dept;

input @;

if newfile then do;

counter=1;

newfile=0;

end;

else counter+1;

if counter eq 3 then do;

input @ "ENCOUNTER " patient;

end;

else if input(substr(_infile_,123),?? comma9.) and

substr(_infile_,10,15) ne "CHARGE QUANTITY" then do;

input

dept_in 2-11

@13 product &

@24 charge_description &

@49 ordering_md &

type 64-65

day 67-75

class 76-92

@94 ip_volume

@103 charge_quantity

@112 actual_cost

@123 total_charge

;

if not missing(strip(dept_in)) then

dept=dept_in;

output;

end;

else input;

run;

Registration is open

SAS Training: Just a Click Away