Help using Base SAS procedures

Need help with reading in a text file

Reply
Occasional Contributor
Posts: 10

Need help with reading in a text file

Hi everyone!

I have a text file containing data as shown below:

ABC

04885423

04886522

048809876

4666324557677879

5754246789532123

DEF

4321234567897654

5434456434565211

GHI

0466643

089876665

0987643349

ABC

097457738

73638933

03988833

The string 'ABC', 'DEF', and 'GHI' is like a group header.

What I need to do is this:

I need to read in the data and separate them into different datasets based on the header the are under.

For example, when i read the 1st header which is 'ABC' the 5 observations under it will be written to a dataset, say data1.

Then when i hit the 2nd header which is 'DEF' the 2 observations under it will be written to another dataset, say data 2.

Same goes for 3rd header.

When i come to the 4th header, which is 'ABC', the observations to be added to dataset data1 if possible.

Thanks in advance!

PROC Star
Posts: 7,492

Need help with reading in a text file

I'm sure there is a more direct way, but the following will do what I think you are trying to accomplish:

data have;

  informat thedata $20.;

  input thedata;

  cards;

ABC

04885423

04886522

048809876

4666324557677879

5754246789532123

DEF

4321234567897654

5434456434565211

GHI

0466643

089876665

0987643349

ABC

097457738

73638933

03988833

;

proc sql noprint;

  select distinct thedata

    into : distinctfiles separated by " "

      from have

        where anyalpha(thedata) gt 0

;

  select distinct 'if filenm eq "'||strip(thedata)||'" then output '||thedata||";"

    into : ifstmt separated by " "

      from have

        where anyalpha(thedata) gt 0

;

quit;

data &distinctfiles.;

  set have;

  retain filenm;

  if anyalpha(thedata) then filenm=thedata;

  else do;

    &ifstmt.

  end;

run;

Occasional Contributor
Posts: 10

Need help with reading in a text file

Hi Art,

Thanks for the reply.

Tried the code suggested. I'm hitting some syntax errors. I think because my actual data is more complicated than what i had posted originally.

1) some of the headers consist of two words, and some have special characters.

2) some of the data(non header) has alphabets.

I hit syntax errors for observations that hit above 2 scenarios.

ABC XYZ

04885423

04886522T

0488G09876

4666324557677879

DEF

4321234567897654

5434456434565211

GHI #CVD

0466643

089876665

0987643349

ABC

097457738

73638933

03988833

Would really appreciate your advice.

Super User
Posts: 10,048

Need help with reading in a text file

It is almost like Art.T 's code.

data have;
  input thedata $20.;
  retain tname;
  if anyalpha(thedata) then do; tname=thedata; delete;end;
  cards;
ABC
04885423
04886522
048809876
4666324557677879
5754246789532123
DEF
4321234567897654
5434456434565211
GHI
0466643
089876665
0987643349
ABC
097457738
73638933
03988833
;
run;
proc sort data=have;by tname;run;
data _null_;
 declare hash ha (hashexp:10);
  ha.definekey('tname','count');
  ha.definedata('thedata');
  ha.definedone();
  do until(last.tname);
   set have;
   by tname;
   count+1;
   ha.add();
  end;
 ha.output(dataset: tname);
run;

Ksharp

Occasional Contributor
Posts: 10

Need help with reading in a text file

Hi ksharp,

Hitting some errors as well, if the headers and observations are more complicated as in my 2nd post.

Would appreciate your advice as well.

My apologies for not noticing all the patterns of the data.

PROC Star
Posts: 7,492

Need help with reading in a text file

Just add in some conditions to limit what you are looking for and ensure that valid SAS filenames are derived.  E.g.:

data have;

  informat thedata $20.;

  input thedata;

  cards;

ABC XYZ

04885423

04886522T

0488G09876

4666324557677879

DEF

4321234567897654

5434456434565211

GHI #CVD

0466643

089876665

0987643349

ABC

097457738

73638933

03988833

;

proc sql noprint;

  select distinct compress(thedata,,'kdf')

    into : distinctfiles separated by " "

      from have

        where anyalpha(substr(thedata,1,1)) gt 0

;

  select distinct 'if filenm eq "'||strip(compress(thedata,,'kdf'))||

   '" then output '||compress(thedata,,'kdf')||";"

    into : ifstmt separated by " "

      from have

        where anyalpha(substr(thedata,1,1)) gt 0

;

quit;

data &distinctfiles.;

  set have;

  retain filenm;

  if anyalpha(substr(thedata,1,1)) then

    filenm=compress(thedata,,'kdf');

  else do;

    &ifstmt.

  end;

run;

Respected Advisor
Posts: 3,799

Re: Need help with reading in a text file

This can all be done in one step.  My TEST for "is it a GROUP row" may not be strict enough for your real data.  Also,  it looks like the data rows may contain more data than a simple string of numbers.  If that is the case the ELSE DO can be change to an INPUT statement to read the fields properly.

filename FT15F001 temp;

data groupinput;

   attrib group length=$20.;

   retain group;

   attrib value length=$64.;

   infile FT15F001 truncover;

   input @;

   if anyalpha(first(_infile_)) then do;

      group = _infile_;

      delete;

      end;

   else do;

      value = _infile_;

      output;

      end;

   parmcards;

ABC

04885423

04886522

048809876

4666324557677879

5754246789532123

DEF

4321234567897654

5434456434565211

GHI

0466643

089876665

0987643349

ABC

097457738

73638933

03988833

ABC XYZ

04885423

04886522T

0488G09876

4666324557677879

DEF

4321234567897654

5434456434565211

GHI #CVD

0466643

089876665

0987643349

ABC

097457738

73638933

03988833

;;;;

   run;

proc print;

   run;

PROC Star
Posts: 7,492

Re: Need help with reading in a text file

Posted in reply to data_null__

DN, you missed something that the OP specified: "I need to read in the data and separate them into different datasets based on the header the are under."

However, I was hoping that you would look at this as I couldn't remember how to take a character string and apply it to an outfile statement.  I got around that by creating a macro variable via proc sql, but I am sure you could come up with something using inputc, putc or the like.

Respected Advisor
Posts: 3,799

Re: Need help with reading in a text file

Yes I did miss that part about creating separate data sets. 

If the data are grouped properly then the new data sets could be created in one pass using the HASH method posted by ksharp and my imput method.  Some of the group names are not proper MEMBER names.  OR if not grouped then sort the output from my program by group and use the HASH.  Or the code gen method you posted.  Both are adequate for the task we don't have enough info to know which would be better depending on the number of groups or total obs etc.

For your question regarding outfile are you referring to the FILE statement option FILEVAR? 

PROC Star
Posts: 7,492

Re: Need help with reading in a text file

Posted in reply to data_null__

No, I was referring to the following scenario:

data &distinctfiles.;

  set have;

  retain filenm;

  if anyalpha(substr(thedata,1,1)) then

    filenm=compress(thedata,,'kdf');

  else do;

    &ifstmt.

  end;

run;

in place of the &ifstmt. that I created in proc sql, how could one specify:  output filenm;

Respected Advisor
Posts: 3,799

Need help with reading in a text file

You can't.  The OUTPUT statement accepts only name(s) not expressions.  You need output METHOD to use expressions and the associated HASH bits.

Super User
Posts: 10,048

Need help with reading in a text file

you just need to change  the code a little.

data have;
  input thedata $20.;
  retain tname;
  if anyalpha(thedata) eq 1 then do; tname=thedata; delete;end; 
  cards;
   ABC XYZ
04885423
04886522T
0488G09876
4666324557677879
DEF
4321234567897654
5434456434565211
GHI #CVD
0466643
089876665
0987643349
ABC
097457738
73638933
03988833
;
run;

proc sort data=have;by tname;run;
data _null_;
 declare hash ha (hashexp:10);
  ha.definekey('tname','count');
  ha.definedata('thedata');
  ha.definedone();
  do until(last.tname);
   set have;
   by tname;
   count+1;
   ha.add();
  end;
 ha.output(dataset: scan(tname,1)||strip(count));
run;

Ksharp

Ask a Question
Discussion stats
  • 11 replies
  • 153 views
  • 0 likes
  • 4 in conversation