Hi All,
I am facing a challenge while reading a csv file with the data spread in multiple lines.
Sample:
John,Doe
EW202207070053899 FOR RESOLUTION W260560364 JUNGE NAT POB 098 169644
EW202207070055230 FOR RESOLUTION W263027005 RACE NAT PPS 114 161087
Mitchelle, Stark
EW202205020079881 SEE NOTES W268135093 LEE SGB FAX 738 177390
EW202206270042457 FOR RESOLUTION W255586930 RIGGS NAT FAX 754 737493
Can you please help me to read think in the following way:
Mitchelle, Stark EW202207070053899 FOR RESOLUTION W260560364 JUNGE NAT POB 098 169644
Mitchelle, Stark EW202207070055230 FOR RESOLUTION W263027005 RACE NAT PPS 114 161087
Mitchelle, Stark EW202205020079881 SEE NOTES W268135093 LEE SGB FAX 738 177390
Mitchelle, Stark EW202206270042457 FOR RESOLUTION W255586930 RIGGS NAT FAX 754 737493
Thanks,
Have a look at the code below. It shows how to determine the "type" of record and process it accordingly.
See also Testing a Condition Before Creating an Observation
data want;
infile cards truncover;
/*
* define variables
*/
length
type $ 2
name $ 25
line $ 80
;
/*
* read a record
* and keep it in the buffer, the @ at the end
*/
input
@1 type $2.
@
;
/*
* check for type of record, the logic may vary depending
* on the data
*
* if EW, then process detail
* else process "header"
*/
if type = "EW" then do;
input @1 line $80.;
output;
end;
else do;
retain name;
input @1 name $25.;
end;
datalines;
John,Doe
EW202207070053899 FOR RESOLUTION W260560364 JUNGE NAT POB 098 169644
EW202207070055230 FOR RESOLUTION W263027005 RACE NAT PPS 114 161087
EW202207070055230 FOR RESOLUTION W263027005 RACE NAT PPS 114 161087
Mitchelle, Stark
EW202205020079881 SEE NOTES W268135093 LEE SGB FAX 738 177390
EW202206270042457 FOR RESOLUTION W255586930 RIGGS NAT FAX 754 737493
EW202206270042457 FOR RESOLUTION W255586930 RIGGS NAT FAX 754 737493
EW202206270042457 FOR RESOLUTION W255586930 RIGGS NAT FAX 754 737493
EW202206270042457 FOR RESOLUTION W255586930 RIGGS NAT FAX 754 737493
;
If every instance is composed of three lines (line1 for the name, line2 and line3 with other data), you could issue three INPUT sand two PUTs, as in:
data _null_;
input name $char25.;
do i=1 to 2;
input line $80.;
put name $25. line;
end;
datalines;
John,Doe
EW202207070053899 FOR RESOLUTION W260560364 JUNGE NAT POB 098 169644
EW202207070055230 FOR RESOLUTION W263027005 RACE NAT PPS 114 161087
Mitchelle, Stark
EW202205020079881 SEE NOTES W268135093 LEE SGB FAX 738 177390
EW202206270042457 FOR RESOLUTION W255586930 RIGGS NAT FAX 754 737493
run;
Of course, the PUT above writes to the log, You either want to make a new text file (2 lines per person) or a SAS dataset (2 obs per person). In the latter case, you have to modify the INPUT statement inside the loop to read each individual variable in the data lines, and then use OUTPUT instead of PUT, as in:
data want;
input name $char25.;
do i=1 to 2;
input ..... specify input variables here ...;
output;
end;
datalines;
John,Doe
EW202207070053899 FOR RESOLUTION W260560364 JUNGE NAT POB 098 169644
EW202207070055230 FOR RESOLUTION W263027005 RACE NAT PPS 114 161087
Mitchelle, Stark
EW202205020079881 SEE NOTES W268135093 LEE SGB FAX 738 177390
EW202206270042457 FOR RESOLUTION W255586930 RIGGS NAT FAX 754 737493
run;
Have a look at the code below. It shows how to determine the "type" of record and process it accordingly.
See also Testing a Condition Before Creating an Observation
data want;
infile cards truncover;
/*
* define variables
*/
length
type $ 2
name $ 25
line $ 80
;
/*
* read a record
* and keep it in the buffer, the @ at the end
*/
input
@1 type $2.
@
;
/*
* check for type of record, the logic may vary depending
* on the data
*
* if EW, then process detail
* else process "header"
*/
if type = "EW" then do;
input @1 line $80.;
output;
end;
else do;
retain name;
input @1 name $25.;
end;
datalines;
John,Doe
EW202207070053899 FOR RESOLUTION W260560364 JUNGE NAT POB 098 169644
EW202207070055230 FOR RESOLUTION W263027005 RACE NAT PPS 114 161087
EW202207070055230 FOR RESOLUTION W263027005 RACE NAT PPS 114 161087
Mitchelle, Stark
EW202205020079881 SEE NOTES W268135093 LEE SGB FAX 738 177390
EW202206270042457 FOR RESOLUTION W255586930 RIGGS NAT FAX 754 737493
EW202206270042457 FOR RESOLUTION W255586930 RIGGS NAT FAX 754 737493
EW202206270042457 FOR RESOLUTION W255586930 RIGGS NAT FAX 754 737493
EW202206270042457 FOR RESOLUTION W255586930 RIGGS NAT FAX 754 737493
;
data want; infile cards truncover; input col $200.; length name value $ 200; retain name; if prxmatch('/[a-z]+,/i',col) then do;name=col;delete;end; else value=col; drop col; datalines; John,Doe EW202207070053899 FOR RESOLUTION W260560364 JUNGE NAT POB 098 169644 EW202207070055230 FOR RESOLUTION W263027005 RACE NAT PPS 114 161087 Mitchelle, Stark EW202205020079881 SEE NOTES W268135093 LEE SGB FAX 738 177390 EW202206270042457 FOR RESOLUTION W255586930 RIGGS NAT FAX 754 737493 ; run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.