Hi All,
I have a requirement where I need to merge multiples lines records into single. Each row length in 80 bytes.
The challenge is that that each record that is spread in multiple lines is dynamic in number of lines i.e.
1. a new record is identified by a line having '=============….'
2. After the above record, there are 2 lines which has same structure for every record
3. From 4th record, there can be multiple records with same structure and it will be dynamic (1 or 2 or 3 lines...)
So I need to ignore the 1st line indicating new record (i.e. one with '========...'), merge line 2 and line 3 for each line in the next lines...
e.g.
===============================================================================
UNIQUEID_RECORD00001 2018-06-05 01:00:00 USER001 FNAME1 MNAME2 LNAME1
00001 PERSONAL_RECORD_00000000001 00000000000000000001 JOBPRFL1
CITY0001 CODE1 ADDRESS1 ADDRESS2 ADDRESS3 PINCODE1
CITY0002 CODE2 ADDRESS1 ADDRESS2 ADDRESS3 PINCODE1
===============================================================================
UNIQUEID_RECORD00002 2018-06-05 01:00:00 USER002 FNAME1 MNAME2 LNAME1
00002 PERSONAL_RECORD_00000000001 00000000000000000002 JOBPRFL2
CITY0003 CODE3 ADDRESS1 ADDRESS2 ADDRESS3 PINCODE1
CITY0004 CODE4 ADDRESS1 ADDRESS2 ADDRESS3 PINCODE1
CITY0005 CODE5 ADDRESS1 ADDRESS2 ADDRESS3 PINCODE1
CITY0006 CODE6 ADDRESS1 ADDRESS2 ADDRESS3 PINCODE1
===============================================================================
UNIQUEID_RECORD00003 2018-06-05 01:00:00 USER001 FNAME1 MNAME2 LNAME1
00003 PERSONAL_RECORD_00000000003 00000000000000000031 JOBPRFL3
CITY0002 CODE2 ADDRESS1 ADDRESS2 ADDRESS3 PINCODE1
output should be
UNIQUEID_RECORD00001 2018-06-05 01:00:00 USER001 FNAME1 MNAME2 LNAME1 00001 PERSONAL_RECORD_00000000001 00000000000000000001 JOBPRFL1 CITY0001 CODE1 ADDRESS1 ADDRESS2 ADDRESS3 PINCODE1
UNIQUEID_RECORD00001 2018-06-05 01:00:00 USER001 FNAME1 MNAME2 LNAME1 00001 PERSONAL_RECORD_00000000001 00000000000000000001 JOBPRFL1 CITY0002 CODE2 ADDRESS1 ADDRESS2 ADDRESS3 PINCODE1
UNIQUEID_RECORD00002 2018-06-05 01:00:00 USER002 FNAME1 MNAME2 LNAME1 00002 PERSONAL_RECORD_00000000001 00000000000000000002 JOBPRFL2 CITY0003 CODE3 ADDRESS1 ADDRESS2 ADDRESS3 PINCODE1
UNIQUEID_RECORD00002 2018-06-05 01:00:00 USER002 FNAME1 MNAME2 LNAME1 00002 PERSONAL_RECORD_00000000001 00000000000000000002 JOBPRFL2 CITY0004 CODE4 ADDRESS1 ADDRESS2 ADDRESS3 PINCODE1
UNIQUEID_RECORD00002 2018-06-05 01:00:00 USER002 FNAME1 MNAME2 LNAME1 00002 PERSONAL_RECORD_00000000001 00000000000000000002 JOBPRFL2 CITY0005 CODE5 ADDRESS1 ADDRESS2 ADDRESS3 PINCODE1
UNIQUEID_RECORD00002 2018-06-05 01:00:00 USER002 FNAME1 MNAME2 LNAME1 00002 PERSONAL_RECORD_00000000001 00000000000000000002 JOBPRFL2 CITY0006 CODE6 ADDRESS1 ADDRESS2 ADDRESS3 PINCODE1
UNIQUEID_RECORD00003 2018-06-05 01:00:00 USER001 FNAME1 MNAME2 LNAME1 00003 PERSONAL_RECORD_00000000003 00000000000000000031 JOBPRFL3 CITY0002 CODE2 ADDRESS1 ADDRESS2 ADDRESS3 PINCODE1
Something like this should do it:
data want;
length record_id $40 date time 8 user fname mname lname $20 recordno 8 person_id $40 x 8
jobprofile $40 city code adr1 adr2 adr3 pincode $20;
informat date yymmdd. time time.;
format date yymmdd10. time time8.;
infile cards firstobs=2 eof=done; /* firstobs=2 to drop the first separator line */
input record_id date time user fname mname lname /
recordno person_id x jobprofile
/ @
;
do while(_infile_ ne: '=========');
input city code adr1 adr2 adr3 pincode /@;
output;
end;
done:
cards;
===============================================================================
UNIQUEID_RECORD00001 2018-06-05 01:00:00 USER001 FNAME1 MNAME2 LNAME1
00001 PERSONAL_RECORD_00000000001 00000000000000000001 JOBPRFL1
CITY0001 CODE1 ADDRESS1 ADDRESS2 ADDRESS3 PINCODE1
CITY0002 CODE2 ADDRESS1 ADDRESS2 ADDRESS3 PINCODE1
===============================================================================
UNIQUEID_RECORD00002 2018-06-05 01:00:00 USER002 FNAME1 MNAME2 LNAME1
00002 PERSONAL_RECORD_00000000001 00000000000000000002 JOBPRFL2
CITY0003 CODE3 ADDRESS1 ADDRESS2 ADDRESS3 PINCODE1
CITY0004 CODE4 ADDRESS1 ADDRESS2 ADDRESS3 PINCODE1
CITY0005 CODE5 ADDRESS1 ADDRESS2 ADDRESS3 PINCODE1
CITY0006 CODE6 ADDRESS1 ADDRESS2 ADDRESS3 PINCODE1
===============================================================================
UNIQUEID_RECORD00003 2018-06-05 01:00:00 USER001 FNAME1 MNAME2 LNAME1
00003 PERSONAL_RECORD_00000000003 00000000000000000031 JOBPRFL3
CITY0002 CODE2 ADDRESS1 ADDRESS2 ADDRESS3 PINCODE1
;run;
Of course, you will probably want to use an external file as your infile, and drop the CARDS section.
The EOF=DONE thing is a workaround in case there is no separator line after the last data line.
Hi Lassen,
Sorry for the delay in coming back on this.
Thanks a lot for your help here and it is working fine as expected.
Only case it has not handled is that the last line is not picked in the output.
There will not be a separator at the EOF.
So if you can further help to handle this condition, it would be a complete solution.
Thanks,
Rinku
Sorry, do you have any problems with the code I submitted? I put the EOF=DONE and the DONE label in to handle the case where there was no delimiter line in the end. If that gives an error, you will have to be more specific, maybe show a sample file and the log.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.