Hi All,
I'm a newcomer to SAS programming, so apologies in advance if I'm asking obvious questions.
I have a project where I receive a fixed width text file that has differeing "columns" by row.
For example the layout of a file might be:
AAAFirstNameLastNameDob
BBBSomeDetails
BBBMoreDetails
BBBYetMoreDetails
ZZZDoneWithPerson
…and it can have multiple people captured in a single file. So multiple AAABBBZZZ sections repeat in the file.
Example data may be:
AAABobDobs2016-01-01
BBBBeer Pizza
BBBCoke Ham
BBBWater Chicken
ZZZBobDobs2016-01-01
AAASueDobs2016-02-01
BBBTea Crackers
BBBAcid Fingernails
BBBJenkemMudPies
ZZZSueDobs2016-02-01
I want to write a script to turn this into data set like this, where I split out the BBB records into columns, and attach to each BBB row, extra columns with informatino from the preceding AAA record:
FNAME LNAME DATE DRINK FOOD
Bob Dobs 2016-01-01 Beer Pizza
Bob Dobs 2016-01-01 Coke Ham
Bob Dobs 2016-01-01 Water Chicken
Sue Dobs 2016-01-01 Tea Crackers
Sue Dobs 2016-01-01 Acid Fingernails
Sue Dobs 2016-01-01 Jenkem Mudpies
I think I should be able to do this with the data step, but it's unclear to me how. So far I have the file imported with code like this:
DATA WORK.input;
LENGTH
F1 $ 512 ;
FORMAT
F1 $CHAR512. ;
INFORMAT
F1 $CHAR512. ;
INFILE '/path/file.txt'
LRECL=512
ENCODING="LATIN1"
TERMSTR=CRLF
TRUNCOVER ;
INPUT
@1 F1 $CHAR512. ;
RUN;
Can anyone suggest where I should go next to get this working?
Thanks!
You need to parse the input string somehow.
On approach is to use conditional INPUT.
Or scan the whole record (in a SAS variable, or by using the automatic variable _INFILE_).
Either way, this calls for competence from an intermediate programmer, so perhaps you might want to look into some Base SAS programming training.
Thanks for the reply. I'm more of a learn by doing person, but I do plan on doing a course once my company has funds to pay for it.
First a comment: I doubt that your data is actually fixed width unless every single name in your data is exactly 3 characters for first name and 4 characters for last name.
Also post Code and data in code boxes opened in the forurm with the {i} icon. The question box strips formatting and does proportional font things that may actually change your data. The Code box will prevent that.
Some questions, or possibly attach a TXT file with some more example data:
Does the line with the name information ALWAYS start with the same characters . Your post implies that if the first 3 characters are AAA then the line will be name information. Is that correct? And ZZZ always ends the record for the person?
data want; length code Fname $ 3 Lname $ 4 drink $ 6 food $ 11; informat date yymmdd10.; format date yymmdd10.; Retain fname lname date; input code 1-3 @; if code = 'AAA' then do; input fname 4-6 lname 7-10 date ; end; Else if code='BBB' then do; input drink 4-9 food 10-20 ; output; End; Else if code='ZZZ' then do; input; end; drop code; datalines; AAABobDobs2016-01-01 BBBBeer Pizza BBBCoke Ham BBBWater Chicken ZZZBobDobs2016-01-01 AAASueDobs2016-02-01 BBBTea Crackers BBBAcid Fingernails BBBJenkemMudPies ZZZSueDobs2016-02-01 ;
works for the example data.
@ballardw wrote:First a comment: I doubt that your data is actually fixed width unless every single name in your data is exactly 3 characters for first name and 4 characters for last name.
I was creating an example for purposes of this conversation. My actual data file is much more complicated, but if I knew how to do it for this, I could make what I need for the other one. Thanks for the info, still digesting the rest of your reply, but wanted to add that note to start with.
I'll be sure to use code tags / attachments for future examples.
Some questions, or possibly attach a TXT file with some more example data:
Does the line with the name information ALWAYS start with the same characters . Your post implies that if the first 3 characters are AAA then the line will be name information. Is that correct? And ZZZ always ends the record for the person?
The first 3 chars, are always the record type in my data. The actual content is a bit different, but it is fixed width. Just many more columns.
Hi
Back in the days when SAS was founded, this was a common type of file that would use different record types to indicate what the line was about. So basically in the code you figure out the record type and then do what is necessary.
See sample code below, please note that the data has fixed length for each field, are there any persons that have no record type BBB? What should haben with them?
data want;
infile cards;
/* determine the record type */
input
@1 recordType $3.
@
;
retain fname lname someDate;
/* do something based on the record type */
if recordType = "AAA" then do;
input
@4 fname $6.
@10 lname $10.
@21 someDate anydtdte10.
;
end;
/* only record type BBB needs to be written to output data set */
if recordType = "BBB" then do;
input
@4 drink $6.
@10 food $10.
;
output;
end;
/* we can ignone ZZZ */
drop recordType;
format someDate date9.;
cards;
AAABob Dobs 2016-01-01
BBBBeer Pizza
BBBCoke Ham
BBBWater Chicken
ZZZBob Dobs 2016-01-01
AAASue Dobs 2016-02-01
BBBTea Crackers
BBBAcid Fingernails
BBBJenkemMudPies
ZZZSue Dobs 2016-02-01
AAAAnna Dobs 2016-02-01
BBBTea Crackers
ZZZAnna Dobs 2016-02-01
;
Bruno
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.