BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Sherri
Calcite | Level 5

Hello All, how can I load this type text file(see image of part of the file below) in sas without the highlighed rows, they occure multiple times in the file. And there are hundreds of these type of file, I need to load them in to one dataset. Thanks much!!!

textfile.bmp

I also attached the actual text file below:

1 ACCEPTED SOLUTION

Accepted Solutions
art297
Opal | Level 21

Try the following:

filename mydata "c:\art\test\*.txt";

data want (drop=dept_in);

  infile mydata lrecl=131 pad truncover eov=newfile;

  informat patient dept dept_in product $char10.;

  informat charge_description $24.;

  informat ordering_md type $12.;

  informat type $2.;

  informat day anydtdte8.;

  format day date9.;

  informat class $char16.;

  informat ip_volume charge_quantity 4.;

  format ip_volume charge_quantity 4.1;

  informat actual_cost total_charge comma9.;

  format actual_cost total_charge comma9.2;

  retain patient dept;

  input @;

  if newfile then do;

    counter=1;

    newfile=0;

  end;

  else counter+1;

  if counter eq 3 then do;

    input @ "ENCOUNTER " patient;

  end;

  else if input(substr(_infile_,123),?? comma9.) and

    substr(_infile_,10,15) ne "CHARGE QUANTITY" then do;

    input

     dept_in 2-11

     @13 product &

     @24 charge_description &

     @49 ordering_md &

     type 64-65

     day 67-75

     class 76-92

     @94 ip_volume

     @103 charge_quantity

     @112 actual_cost

     @123 total_charge

    ;

    if not missing(strip(dept_in)) then

      dept=dept_in;

    output;

  end;

  else input;

run;

View solution in original post

18 REPLIES 18
DBailey
Lapis Lazuli | Level 10

data want;

infile tfile truncover;

input @1 RecType:$3.;

if RecType = 'DCRPT' then do;

     i=0;

     do i = 1 to 11;

     input;

     end;

that gets rid of the page header.  You would then need to start the next input @1.

Then use proc append to merge all of the datasets.

data_null__
Jade | Level 19

If RECTYPE is $3 how can it be equal to 'DCRPT'?   :smileyshocked:

I think reading all the files as one input stream using wildcard fileref would be much more efficient than hundreds of PROC APPENDs.

DBailey
Lapis Lazuli | Level 10

good catch...

DBailey
Lapis Lazuli | Level 10

You might could use the OS to concatenate prior to the data step.

art297
Opal | Level 21

One way would be to exclude even more than you highlighted.  If all of your data records are similar to the ones shown in your example, the actual data lines are longer than all of the header records, with the exception of the page number record.

And, what appears to distinguish the data records from the page number header record, is that the last column of your data records (which I presume is between two specific columns in your actual file) are the only records that contain just a number between those columns.

As such, I'd write some code that read in the file, but only kept those records which had a valid number in the range of columns where actual cost is recorded.  Then, I'd use a datastep to read in the revised file, declaring the variable names in an input statement and declaring all of the desired formats and informats with format and informat statements.

Sherri
Calcite | Level 5

here is the look of all the columns. the header occours mutilple times in the file. and there are hundreds files like this so it is not realistic to manual revise all those files. thanks!!

textfile.bmp

art297
Opal | Level 21

I don't think anyone (that I noticed at least) was suggesting anything manual.  Can you post the notepad file itself rather than the screen shot of it?

Sherri
Calcite | Level 5

yes. here is the file. thanks much!!

ballardw
Super User

Do you want to exclude the totals at the bottom as well?

Should the DEPT value be considerd as having been entered on the blank lines below each value? For example do you want DEPT = ICU/CCU-TR for the PRODUCT value of CRRT/SLED on the second row of the detail lines?

The date column in the body of the data looks like it is incrementing a day of period 1 through 6 with a date. Should only the record that appears with those values have them, all records until the next read value have the same day number and date or exclude entirely?

Sherri
Calcite | Level 5

yes. i want to exclude the totals at the bottom. Thanks much!

data_null__
Jade | Level 19

If you can, post enough example data to represent most all the relevant issues.

Instead of worrying about how to get rid of the parts you don't want I would flip that over and concentrate on how to read the parts you do want.

art297
Opal | Level 21

Other than changing the location of your files in the filename statement the following should come awfully close.  I made two copies of your test data and put them in a directory labeled "c:\art\test".  The following code read in both files and parsed the records for the desired data.  It skips the last records of the file, which appear to be summary records:

filename mydata "c:\art\test\*.txt";

data want;

  infile mydata lrecl=131 pad truncover;

  informat dept product $char10.;

  informat charge_description $24.;

  informat ordering_md type $12.;

  informat type $2.;

  informat day anydtdte8.;

  format day date9.;

  informat class $char16.;

  informat ip_volume charge_quantity 4.;

  format ip_volume charge_quantity 4.1;

  informat actual_cost total_charge comma9.;

  format actual_cost total_charge comma9.2;

  input @;

  if input(substr(_infile_,123),?? comma9.) and

    substr(_infile_,10,15) ne "CHARGE QUANTITY" then do;

    input

     dept 2-11

     @13 product &

     @24 charge_description &

     @49 ordering_md &

     type 64-65

     day 67-75

     class 76-92

     @94 ip_volume

     @103 charge_quantity

     @112 actual_cost

     @123 total_charge

    ;

    output;

  end;

  else input;

run;

Sherri
Calcite | Level 5

Great! it runs just perfectly!! thank you so much Arthur and everyone!

I just have one more question: the first column "Dept" is not populated for every single record. how can I fill the empty field with the value of the record above until the new value and so on.

And lastly add a column for patient ID. Patient ID stays the same within each file since one file is for one patient.

sorry I have some many questions. I really appreciate your help!! many thanks again!!!

art297
Opal | Level 21

Try the following:

filename mydata "c:\art\test\*.txt";

data want (drop=dept_in);

  infile mydata lrecl=131 pad truncover eov=newfile;

  informat patient dept dept_in product $char10.;

  informat charge_description $24.;

  informat ordering_md type $12.;

  informat type $2.;

  informat day anydtdte8.;

  format day date9.;

  informat class $char16.;

  informat ip_volume charge_quantity 4.;

  format ip_volume charge_quantity 4.1;

  informat actual_cost total_charge comma9.;

  format actual_cost total_charge comma9.2;

  retain patient dept;

  input @;

  if newfile then do;

    counter=1;

    newfile=0;

  end;

  else counter+1;

  if counter eq 3 then do;

    input @ "ENCOUNTER " patient;

  end;

  else if input(substr(_infile_,123),?? comma9.) and

    substr(_infile_,10,15) ne "CHARGE QUANTITY" then do;

    input

     dept_in 2-11

     @13 product &

     @24 charge_description &

     @49 ordering_md &

     type 64-65

     day 67-75

     class 76-92

     @94 ip_volume

     @103 charge_quantity

     @112 actual_cost

     @123 total_charge

    ;

    if not missing(strip(dept_in)) then

      dept=dept_in;

    output;

  end;

  else input;

run;

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 18 replies
  • 1172 views
  • 3 likes
  • 5 in conversation