BookmarkSubscribeRSS Feed
ZoeyElle
Calcite | Level 5

Hi everyone!

I have a text file containing data as shown below:

ABC

04885423

04886522

048809876

4666324557677879

5754246789532123

DEF

4321234567897654

5434456434565211

GHI

0466643

089876665

0987643349

ABC

097457738

73638933

03988833

The string 'ABC', 'DEF', and 'GHI' is like a group header.

What I need to do is this:

I need to read in the data and separate them into different datasets based on the header the are under.

For example, when i read the 1st header which is 'ABC' the 5 observations under it will be written to a dataset, say data1.

Then when i hit the 2nd header which is 'DEF' the 2 observations under it will be written to another dataset, say data 2.

Same goes for 3rd header.

When i come to the 4th header, which is 'ABC', the observations to be added to dataset data1 if possible.

Thanks in advance!

11 REPLIES 11
art297
Opal | Level 21

I'm sure there is a more direct way, but the following will do what I think you are trying to accomplish:

data have;

  informat thedata $20.;

  input thedata;

  cards;

ABC

04885423

04886522

048809876

4666324557677879

5754246789532123

DEF

4321234567897654

5434456434565211

GHI

0466643

089876665

0987643349

ABC

097457738

73638933

03988833

;

proc sql noprint;

  select distinct thedata

    into : distinctfiles separated by " "

      from have

        where anyalpha(thedata) gt 0

;

  select distinct 'if filenm eq "'||strip(thedata)||'" then output '||thedata||";"

    into : ifstmt separated by " "

      from have

        where anyalpha(thedata) gt 0

;

quit;

data &distinctfiles.;

  set have;

  retain filenm;

  if anyalpha(thedata) then filenm=thedata;

  else do;

    &ifstmt.

  end;

run;

ZoeyElle
Calcite | Level 5

Hi Art,

Thanks for the reply.

Tried the code suggested. I'm hitting some syntax errors. I think because my actual data is more complicated than what i had posted originally.

1) some of the headers consist of two words, and some have special characters.

2) some of the data(non header) has alphabets.

I hit syntax errors for observations that hit above 2 scenarios.

ABC XYZ

04885423

04886522T

0488G09876

4666324557677879

DEF

4321234567897654

5434456434565211

GHI #CVD

0466643

089876665

0987643349

ABC

097457738

73638933

03988833

Would really appreciate your advice.

Ksharp
Super User

It is almost like Art.T 's code.

data have;
  input thedata $20.;
  retain tname;
  if anyalpha(thedata) then do; tname=thedata; delete;end;
  cards;
ABC
04885423
04886522
048809876
4666324557677879
5754246789532123
DEF
4321234567897654
5434456434565211
GHI
0466643
089876665
0987643349
ABC
097457738
73638933
03988833
;
run;
proc sort data=have;by tname;run;
data _null_;
 declare hash ha (hashexp:10);
  ha.definekey('tname','count');
  ha.definedata('thedata');
  ha.definedone();
  do until(last.tname);
   set have;
   by tname;
   count+1;
   ha.add();
  end;
 ha.output(dataset: tname);
run;

Ksharp

ZoeyElle
Calcite | Level 5

Hi ksharp,

Hitting some errors as well, if the headers and observations are more complicated as in my 2nd post.

Would appreciate your advice as well.

My apologies for not noticing all the patterns of the data.

art297
Opal | Level 21

Just add in some conditions to limit what you are looking for and ensure that valid SAS filenames are derived.  E.g.:

data have;

  informat thedata $20.;

  input thedata;

  cards;

ABC XYZ

04885423

04886522T

0488G09876

4666324557677879

DEF

4321234567897654

5434456434565211

GHI #CVD

0466643

089876665

0987643349

ABC

097457738

73638933

03988833

;

proc sql noprint;

  select distinct compress(thedata,,'kdf')

    into : distinctfiles separated by " "

      from have

        where anyalpha(substr(thedata,1,1)) gt 0

;

  select distinct 'if filenm eq "'||strip(compress(thedata,,'kdf'))||

   '" then output '||compress(thedata,,'kdf')||";"

    into : ifstmt separated by " "

      from have

        where anyalpha(substr(thedata,1,1)) gt 0

;

quit;

data &distinctfiles.;

  set have;

  retain filenm;

  if anyalpha(substr(thedata,1,1)) then

    filenm=compress(thedata,,'kdf');

  else do;

    &ifstmt.

  end;

run;

data_null__
Jade | Level 19

This can all be done in one step.  My TEST for "is it a GROUP row" may not be strict enough for your real data.  Also,  it looks like the data rows may contain more data than a simple string of numbers.  If that is the case the ELSE DO can be change to an INPUT statement to read the fields properly.

filename FT15F001 temp;

data groupinput;

   attrib group length=$20.;

   retain group;

   attrib value length=$64.;

   infile FT15F001 truncover;

   input @;

   if anyalpha(first(_infile_)) then do;

      group = _infile_;

      delete;

      end;

   else do;

      value = _infile_;

      output;

      end;

   parmcards;

ABC

04885423

04886522

048809876

4666324557677879

5754246789532123

DEF

4321234567897654

5434456434565211

GHI

0466643

089876665

0987643349

ABC

097457738

73638933

03988833

ABC XYZ

04885423

04886522T

0488G09876

4666324557677879

DEF

4321234567897654

5434456434565211

GHI #CVD

0466643

089876665

0987643349

ABC

097457738

73638933

03988833

;;;;

   run;

proc print;

   run;

art297
Opal | Level 21

DN, you missed something that the OP specified: "I need to read in the data and separate them into different datasets based on the header the are under."

However, I was hoping that you would look at this as I couldn't remember how to take a character string and apply it to an outfile statement.  I got around that by creating a macro variable via proc sql, but I am sure you could come up with something using inputc, putc or the like.

data_null__
Jade | Level 19

Yes I did miss that part about creating separate data sets. 

If the data are grouped properly then the new data sets could be created in one pass using the HASH method posted by ksharp and my imput method.  Some of the group names are not proper MEMBER names.  OR if not grouped then sort the output from my program by group and use the HASH.  Or the code gen method you posted.  Both are adequate for the task we don't have enough info to know which would be better depending on the number of groups or total obs etc.

For your question regarding outfile are you referring to the FILE statement option FILEVAR? 

art297
Opal | Level 21

No, I was referring to the following scenario:

data &distinctfiles.;

  set have;

  retain filenm;

  if anyalpha(substr(thedata,1,1)) then

    filenm=compress(thedata,,'kdf');

  else do;

    &ifstmt.

  end;

run;

in place of the &ifstmt. that I created in proc sql, how could one specify:  output filenm;

data_null__
Jade | Level 19

You can't.  The OUTPUT statement accepts only name(s) not expressions.  You need output METHOD to use expressions and the associated HASH bits.

Ksharp
Super User

you just need to change  the code a little.

data have;
  input thedata $20.;
  retain tname;
  if anyalpha(thedata) eq 1 then do; tname=thedata; delete;end; 
  cards;
   ABC XYZ
04885423
04886522T
0488G09876
4666324557677879
DEF
4321234567897654
5434456434565211
GHI #CVD
0466643
089876665
0987643349
ABC
097457738
73638933
03988833
;
run;

proc sort data=have;by tname;run;
data _null_;
 declare hash ha (hashexp:10);
  ha.definekey('tname','count');
  ha.definedata('thedata');
  ha.definedone();
  do until(last.tname);
   set have;
   by tname;
   count+1;
   ha.add();
  end;
 ha.output(dataset: scan(tname,1)||strip(count));
run;

Ksharp

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 1397 views
  • 0 likes
  • 4 in conversation