- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi All,
I am facing a challenge while reading a csv file with the data spread in multiple lines.
Sample:
John,Doe
EW202207070053899 FOR RESOLUTION W260560364 JUNGE NAT POB 098 169644
EW202207070055230 FOR RESOLUTION W263027005 RACE NAT PPS 114 161087
Mitchelle, Stark
EW202205020079881 SEE NOTES W268135093 LEE SGB FAX 738 177390
EW202206270042457 FOR RESOLUTION W255586930 RIGGS NAT FAX 754 737493
Can you please help me to read think in the following way:
Mitchelle, Stark EW202207070053899 FOR RESOLUTION W260560364 JUNGE NAT POB 098 169644
Mitchelle, Stark EW202207070055230 FOR RESOLUTION W263027005 RACE NAT PPS 114 161087
Mitchelle, Stark EW202205020079881 SEE NOTES W268135093 LEE SGB FAX 738 177390
Mitchelle, Stark EW202206270042457 FOR RESOLUTION W255586930 RIGGS NAT FAX 754 737493
Thanks,
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Have a look at the code below. It shows how to determine the "type" of record and process it accordingly.
See also Testing a Condition Before Creating an Observation
data want;
infile cards truncover;
/*
* define variables
*/
length
type $ 2
name $ 25
line $ 80
;
/*
* read a record
* and keep it in the buffer, the @ at the end
*/
input
@1 type $2.
@
;
/*
* check for type of record, the logic may vary depending
* on the data
*
* if EW, then process detail
* else process "header"
*/
if type = "EW" then do;
input @1 line $80.;
output;
end;
else do;
retain name;
input @1 name $25.;
end;
datalines;
John,Doe
EW202207070053899 FOR RESOLUTION W260560364 JUNGE NAT POB 098 169644
EW202207070055230 FOR RESOLUTION W263027005 RACE NAT PPS 114 161087
EW202207070055230 FOR RESOLUTION W263027005 RACE NAT PPS 114 161087
Mitchelle, Stark
EW202205020079881 SEE NOTES W268135093 LEE SGB FAX 738 177390
EW202206270042457 FOR RESOLUTION W255586930 RIGGS NAT FAX 754 737493
EW202206270042457 FOR RESOLUTION W255586930 RIGGS NAT FAX 754 737493
EW202206270042457 FOR RESOLUTION W255586930 RIGGS NAT FAX 754 737493
EW202206270042457 FOR RESOLUTION W255586930 RIGGS NAT FAX 754 737493
;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If every instance is composed of three lines (line1 for the name, line2 and line3 with other data), you could issue three INPUT sand two PUTs, as in:
data _null_;
input name $char25.;
do i=1 to 2;
input line $80.;
put name $25. line;
end;
datalines;
John,Doe
EW202207070053899 FOR RESOLUTION W260560364 JUNGE NAT POB 098 169644
EW202207070055230 FOR RESOLUTION W263027005 RACE NAT PPS 114 161087
Mitchelle, Stark
EW202205020079881 SEE NOTES W268135093 LEE SGB FAX 738 177390
EW202206270042457 FOR RESOLUTION W255586930 RIGGS NAT FAX 754 737493
run;
Of course, the PUT above writes to the log, You either want to make a new text file (2 lines per person) or a SAS dataset (2 obs per person). In the latter case, you have to modify the INPUT statement inside the loop to read each individual variable in the data lines, and then use OUTPUT instead of PUT, as in:
data want;
input name $char25.;
do i=1 to 2;
input ..... specify input variables here ...;
output;
end;
datalines;
John,Doe
EW202207070053899 FOR RESOLUTION W260560364 JUNGE NAT POB 098 169644
EW202207070055230 FOR RESOLUTION W263027005 RACE NAT PPS 114 161087
Mitchelle, Stark
EW202205020079881 SEE NOTES W268135093 LEE SGB FAX 738 177390
EW202206270042457 FOR RESOLUTION W255586930 RIGGS NAT FAX 754 737493
run;
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Have a look at the code below. It shows how to determine the "type" of record and process it accordingly.
See also Testing a Condition Before Creating an Observation
data want;
infile cards truncover;
/*
* define variables
*/
length
type $ 2
name $ 25
line $ 80
;
/*
* read a record
* and keep it in the buffer, the @ at the end
*/
input
@1 type $2.
@
;
/*
* check for type of record, the logic may vary depending
* on the data
*
* if EW, then process detail
* else process "header"
*/
if type = "EW" then do;
input @1 line $80.;
output;
end;
else do;
retain name;
input @1 name $25.;
end;
datalines;
John,Doe
EW202207070053899 FOR RESOLUTION W260560364 JUNGE NAT POB 098 169644
EW202207070055230 FOR RESOLUTION W263027005 RACE NAT PPS 114 161087
EW202207070055230 FOR RESOLUTION W263027005 RACE NAT PPS 114 161087
Mitchelle, Stark
EW202205020079881 SEE NOTES W268135093 LEE SGB FAX 738 177390
EW202206270042457 FOR RESOLUTION W255586930 RIGGS NAT FAX 754 737493
EW202206270042457 FOR RESOLUTION W255586930 RIGGS NAT FAX 754 737493
EW202206270042457 FOR RESOLUTION W255586930 RIGGS NAT FAX 754 737493
EW202206270042457 FOR RESOLUTION W255586930 RIGGS NAT FAX 754 737493
;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data want; infile cards truncover; input col $200.; length name value $ 200; retain name; if prxmatch('/[a-z]+,/i',col) then do;name=col;delete;end; else value=col; drop col; datalines; John,Doe EW202207070053899 FOR RESOLUTION W260560364 JUNGE NAT POB 098 169644 EW202207070055230 FOR RESOLUTION W263027005 RACE NAT PPS 114 161087 Mitchelle, Stark EW202205020079881 SEE NOTES W268135093 LEE SGB FAX 738 177390 EW202206270042457 FOR RESOLUTION W255586930 RIGGS NAT FAX 754 737493 ; run;