Hi everyone!
I have a text file containing data as shown below:
ABC
04885423
04886522
048809876
4666324557677879
5754246789532123
DEF
4321234567897654
5434456434565211
GHI
0466643
089876665
0987643349
ABC
097457738
73638933
03988833
The string 'ABC', 'DEF', and 'GHI' is like a group header.
What I need to do is this:
I need to read in the data and separate them into different datasets based on the header the are under.
For example, when i read the 1st header which is 'ABC' the 5 observations under it will be written to a dataset, say data1.
Then when i hit the 2nd header which is 'DEF' the 2 observations under it will be written to another dataset, say data 2.
Same goes for 3rd header.
When i come to the 4th header, which is 'ABC', the observations to be added to dataset data1 if possible.
Thanks in advance!
I'm sure there is a more direct way, but the following will do what I think you are trying to accomplish:
data have;
informat thedata $20.;
input thedata;
cards;
ABC
04885423
04886522
048809876
4666324557677879
5754246789532123
DEF
4321234567897654
5434456434565211
GHI
0466643
089876665
0987643349
ABC
097457738
73638933
03988833
;
proc sql noprint;
select distinct thedata
into : distinctfiles separated by " "
from have
where anyalpha(thedata) gt 0
;
select distinct 'if filenm eq "'||strip(thedata)||'" then output '||thedata||";"
into : ifstmt separated by " "
from have
where anyalpha(thedata) gt 0
;
quit;
data &distinctfiles.;
set have;
retain filenm;
if anyalpha(thedata) then filenm=thedata;
else do;
&ifstmt.
end;
run;
Hi Art,
Thanks for the reply.
Tried the code suggested. I'm hitting some syntax errors. I think because my actual data is more complicated than what i had posted originally.
1) some of the headers consist of two words, and some have special characters.
2) some of the data(non header) has alphabets.
I hit syntax errors for observations that hit above 2 scenarios.
ABC XYZ
04885423
04886522T
0488G09876
4666324557677879
DEF
4321234567897654
5434456434565211
GHI #CVD
0466643
089876665
0987643349
ABC
097457738
73638933
03988833
Would really appreciate your advice.
It is almost like Art.T 's code.
data have; input thedata $20.; retain tname; if anyalpha(thedata) then do; tname=thedata; delete;end; cards; ABC 04885423 04886522 048809876 4666324557677879 5754246789532123 DEF 4321234567897654 5434456434565211 GHI 0466643 089876665 0987643349 ABC 097457738 73638933 03988833 ; run; proc sort data=have;by tname;run; data _null_; declare hash ha (hashexp:10); ha.definekey('tname','count'); ha.definedata('thedata'); ha.definedone(); do until(last.tname); set have; by tname; count+1; ha.add(); end; ha.output(dataset: tname); run;
Ksharp
Hi ksharp,
Hitting some errors as well, if the headers and observations are more complicated as in my 2nd post.
Would appreciate your advice as well.
My apologies for not noticing all the patterns of the data.
Just add in some conditions to limit what you are looking for and ensure that valid SAS filenames are derived. E.g.:
data have;
informat thedata $20.;
input thedata;
cards;
ABC XYZ
04885423
04886522T
0488G09876
4666324557677879
DEF
4321234567897654
5434456434565211
GHI #CVD
0466643
089876665
0987643349
ABC
097457738
73638933
03988833
;
proc sql noprint;
select distinct compress(thedata,,'kdf')
into : distinctfiles separated by " "
from have
where anyalpha(substr(thedata,1,1)) gt 0
;
select distinct 'if filenm eq "'||strip(compress(thedata,,'kdf'))||
'" then output '||compress(thedata,,'kdf')||";"
into : ifstmt separated by " "
from have
where anyalpha(substr(thedata,1,1)) gt 0
;
quit;
data &distinctfiles.;
set have;
retain filenm;
if anyalpha(substr(thedata,1,1)) then
filenm=compress(thedata,,'kdf');
else do;
&ifstmt.
end;
run;
This can all be done in one step. My TEST for "is it a GROUP row" may not be strict enough for your real data. Also, it looks like the data rows may contain more data than a simple string of numbers. If that is the case the ELSE DO can be change to an INPUT statement to read the fields properly.
filename FT15F001 temp;
data groupinput;
attrib group length=$20.;
retain group;
attrib value length=$64.;
infile FT15F001 truncover;
input @;
if anyalpha(first(_infile_)) then do;
group = _infile_;
delete;
end;
else do;
value = _infile_;
output;
end;
parmcards;
ABC
04885423
04886522
048809876
4666324557677879
5754246789532123
DEF
4321234567897654
5434456434565211
GHI
0466643
089876665
0987643349
ABC
097457738
73638933
03988833
ABC XYZ
04885423
04886522T
0488G09876
4666324557677879
DEF
4321234567897654
5434456434565211
GHI #CVD
0466643
089876665
0987643349
ABC
097457738
73638933
03988833
;;;;
run;
proc print;
run;
DN, you missed something that the OP specified: "I need to read in the data and separate them into different datasets based on the header the are under."
However, I was hoping that you would look at this as I couldn't remember how to take a character string and apply it to an outfile statement. I got around that by creating a macro variable via proc sql, but I am sure you could come up with something using inputc, putc or the like.
Yes I did miss that part about creating separate data sets.
If the data are grouped properly then the new data sets could be created in one pass using the HASH method posted by ksharp and my imput method. Some of the group names are not proper MEMBER names. OR if not grouped then sort the output from my program by group and use the HASH. Or the code gen method you posted. Both are adequate for the task we don't have enough info to know which would be better depending on the number of groups or total obs etc.
For your question regarding outfile are you referring to the FILE statement option FILEVAR?
No, I was referring to the following scenario:
data &distinctfiles.;
set have;
retain filenm;
if anyalpha(substr(thedata,1,1)) then
filenm=compress(thedata,,'kdf');
else do;
&ifstmt.
end;
run;
in place of the &ifstmt. that I created in proc sql, how could one specify: output filenm;
You can't. The OUTPUT statement accepts only name(s) not expressions. You need output METHOD to use expressions and the associated HASH bits.
you just need to change the code a little.
data have; input thedata $20.; retain tname; if anyalpha(thedata) eq 1 then do; tname=thedata; delete;end; cards; ABC XYZ 04885423 04886522T 0488G09876 4666324557677879 DEF 4321234567897654 5434456434565211 GHI #CVD 0466643 089876665 0987643349 ABC 097457738 73638933 03988833 ; run; proc sort data=have;by tname;run; data _null_; declare hash ha (hashexp:10); ha.definekey('tname','count'); ha.definedata('thedata'); ha.definedone(); do until(last.tname); set have; by tname; count+1; ha.add(); end; ha.output(dataset: scan(tname,1)||strip(count)); run;
Ksharp
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.