Hi:
As I explained earlier -- the code that I posted was for the scenario where you WANTED to have each record type or "seg" dataline in a different data set.
If you need to have all the information for one person/employee/patient assembled together from multiple data lines into 1 observation, then the technique
I illustrated would NOT be the program you wanted.
Let's start with some simpler data. I have this data in a file called FAMDATA.TXT:
[pre]
e1,"Doug Jones",M,35
s1,"Mandy Jones",F,35
c1,"son",6,"David",M
c2,"daughter",3,"Melissa",F
a1,"1234 Some St.","Dallas","TX",75020
e1,"Anne Austin",F,37
s1,"Jack Austin",M,37
c1,"step-daughter",10,"Andrea",F
c2,"daughter",8,"Jeanine",F
e1,"Bud Hollis",M,37
s1,"Suzie Queue",F,36
c1,"step-son",10,"Sam",M
a1,"3456 Cricket Lane","Dallas","TX",75021
[/pre]
Note that I have 3 families -- the Jones, Austin and Hollis families. In this data, the 'e1' data line is the employee record; the 's1' dataline is the spouse record;
the 'c1', 'c2' lines are for the child records (in my data, people can have 0, 1 or 2 children); finally, the 'a1' data line holds the street, city, state and zip code.
So, I am going to need a different INPUT statement for each record indicator, and I am not going to have a whole family assembled until I either read
the 'a1' record for the family or I read the 'e1' record (I know that Mr. Jones family observation is ready to output when I read the Austin 'e1' record.) In my
data, it is possible to have an employee in the file without an address (as with the Austin family), so I cannot use the output on 'a1' logic in my program.
From these individual records, I want to assemble a single observation for each employee in a SAS dataset. The assembled record will have these variables:
employee emp_gender emp_age (from e1)
spouse sp_gender sp_age (from s1)
child1 gender1 rel1 age1 (from c1)
child2 gender2 rel2 age2 (from c2) street city state zip (from a1)
If any of the record indicators are not found (no child 2 or no address or no spouse), then the variables need to be set to missing. So, for the above
FAMDATA.TXT file, I want to assemble this final SAS dataset:
[pre]
use trailing @ and conditional output to assemble the whole obs from multiple input lines
emp_
Obs employee gender emp_age spouse sp_gender sp_age child1 gender1 rel1 age1 child2 gender2 rel2 age2 street city state zip
1 Doug Jones M 35 Mandy Jones F 35 David M son 6 Melissa F daughter 3 1234 Some St. Dallas TX 75020
2 Anne Austin F 37 Jack Austin M 37 Andrea F step-daughter 10 Jeanine F daughter 8
3 Bud Hollis M 37 Suzie Queue F 36 Sam M step-son 10 . 3456 Cricket Lane Dallas TX 75021
[/pre]
In order for this final observation to be assembled, I needed to explicitly RETAIN the employee and spouse and child information every time I read a specific
record indicator in order to make sure that I had all the information available when I wrote out the final "assembled" observation.
Think of it like this. SAS can only handle one raw data line at a time. So your program has one chance to "catch and save" the information as you are
holding the record in the input buffer. The INPUT statement does the reading from the buffer. That's the "catching" or reading the information from the raw
data line. But the RETAIN statement does the "saving" of the information until you're ready to do the OUTPUT.
Usually, with a simpler set of raw data, you have one OUTPUT (implied) at the end of the DATA step program for every INPUT statement. But in the program
that's needed to read the FAMDATA.TXT data, I only have one OUTPUT statment (explicit) for every group of INPUT statements. And, then I have one final
OUTPUT statement for when I read the end of the raw data file.
The SAS program that read FAMDATA.TXT and produced the above output is shown below. You may need to really study how my program is working
before you can change your program successfully. I suggest you start with some very SIMPLE data. Try the program with and without RETAIN.
Experiment with having the OUTPUT statement in different places and see what happens. Until you understand the DATA step and how SAS processes
raw data and how the RETAIN statement works, you will have problems writing a program to read your HL7 data successfully. You can always contact
Tech Support for more one-on-one help with your specific data and your specific program.
cynthia
[pre]
data allrec
(keep=employee emp_gender emp_age spouse sp_gender sp_age
child1 gender1 rel1 age1
child2 gender2 rel2 age2
street city state zip);
length employee $20 emp_gender $1 emp_age 8
spouse $20 sp_gender $1 sp_age 8
child1 $20 gender1 $1 rel1 $15 age1 8
child2 $20 gender2 $1 rel2 $15 age2 8
street $20 city $15 state $2 zip $5;
retain employee emp_gender emp_age spouse sp_gender sp_age
child1 gender1 rel1 age1
child2 gender2 rel2 age2
street city state zip;
infile 'c:\temp\famdata.txt' dsd dlm = ',' end=eof;
** read in record indicator and hold the line with trailing @;
input recind $ @;
** now, read in the other data lines, depending on the record indicator value;
if recind = 'e1' then do;
if _n_ gt 1 then do;
** this OUTPUT is for ALL the employee information except the last employee.;
** also, do NOT want to output when _N_= 1 because we have only read the first employee data line;
** at this point and we want to read the rest of his info before doing the output.;
** the logic is that every time we read an "e1" data line, we are reading in a NEW employee and must;
** output the info for the PREVIOUS employee.;
output;
** initialize all variables to missing to erase previously retained information;
emp_gender=' '; sp_gender = ' '; emp_age = .; sp_age = .;
age1=.; age2=.; gender1=' '; gender2 = ' '; rel1 = ' '; rel2 = ' ';
child1 = ' '; child2 = ' '; spouse = ' '; street=' '; city=' '; state = ' '; zip = ' ';
end;
** read in the employee record;
input employee $ emp_gender $ emp_age;
end;
else if recind = 's1' then do;
input spouse $ sp_gender $ sp_age;
end;
else if recind = 'c1' then do;
input rel1 $ age1 child1 $ gender1 $;
end;
else if recind = 'c2' then do;
input rel2 $ age2 child2 $ gender2 $;
end;
else if recind = 'a1' then do;
input street $ city $ state $ zip $;
end;
if eof = 1 then do;
** output LAST assembled obs for last employee because have reached end of file;
output;
end;
run;
options nodate nonumber nocenter ls=200;
proc print data=allrec;
title 'use trailing @ and conditional output to assemble the whole obs from multiple input lines';
run;
[/pre]