Re: How to read in HL7 data into sas

deleted_user · Posted 12-02-2008 09:31 AM

I have data coming in HL7 pipe delimited text file. How to read this in SAS data step using infile statement.

Though I am having more than 1 record, my infile statement is giving me as 1 record and I am not getting data dump.

thanks,

Cynthia_sas · Posted 12-02-2008 10:36 AM

Hi:
I remembered reading about HL7 when I was doing some research on XML and CDISC. It looks like your solution may be to use the SAS XML Libname Engine if the file is XML-based. If the data file is NOT XML based, then you should be able to use standard INFILE syntax, possible with DLM= option to parse the data and read it into SAS datasets.

It would help to see what your data looks like. If you cannot post it because it is lab or pathology data and you don't have any dummy data that you could post and you need help reading it, then you should consider contacting SAS Technical Support.

Here are some misc links I had from previous research.
http://www.listserv.uga.edu/cgi-bin/wa?A2=ind0506d&L=sas-l&P=62837
http://www.lexjansen.com/pharmasug/2004/sasinstitute/sas3.pdf
http://www.xml4pharma.com/HL7-XML/HL7-XML_for_CDISC_Standards.pdf

cynthia

deleted_user · Posted 12-02-2008 12:44 PM

Hey Cynthia,

I tried reading the data using the infile statement and using pipe as DLM, Record Format as Fixed and outputting each of the segment data into different datasets.

It is not reading in data properly and showing as only 1 record. Since our data is usual text file, I may not need XML etc.

Here is sample data:

seg1|^~\&|LMS|LMS|sgx|igx|20080101135444||||20080101135444
seg2|^~\&|LMS|LMS|sgx|sgx|||ORU^R01|200810241354441|P|2.3
seg3|R01|20080101135444540
seg4|1|123456789|66273||Linda^Chambers^L||1968-01-21|F|||23 Sargent Dr^^The Woodlands^OH^12345^^H||2031239876||||||123456789
seg2|^~\&|LMS|LMS|sgx|sgx|||ORU^R01|200810241354442|P|2.3
seg3|R01|20081024135444556
seg4|1|472663806|138710||Dough^Thomas^M||1947-01-27|M|||2345 Sea View Dr^^Sugar Land^MA^77789^^H||7815468998||||||5645671930

And these seg* are repeated in turn. My program should writeout each segment info into a different dataset.

Any insight with this kind of data is more appreciated !!

thanks, Message was edited by: Sasbase

Cynthia_sas · Posted 12-02-2008 02:39 PM

Hi: I have a few questions:
what does the ^~\& represent? what does the single ^ represent (as in ORU^R01 in your line 2 of data)? Do you have some kind of data layout that tells you what you might find between each set of delimiters? You'll have to use that documentation in order to build your INPUT statements.

Also, generally record format of fixed only applies to mainframe files -- where do these data files live? Usually Windows and Unix files are Variable length. And, as Doc says, you may have CR or CR+LF issues with the system that's sending you the data. You may need to investigate this.

Do you want all the SEG1 obs into a dataset; all the SEG2 into a dataset, all the SEG3 into a dataset???
OR
You want SEG1 through SEG4 in one dataset and SEG2-SEG4 in a second dataset??
What is a SEG*?? Is each number a separate observation or is a group of seg numbers one observation?

Looking at this, you'd probably be beyond a simple INFILE/INPUT. It looks like each SEG data line has a different number of variables on it. Which points to a trailing @ to hold the line, while your program decides which SEG it has. Something along the lines of this code snippet (not tested):
[pre]
input @1 segind $4. @; <---the trailing @ holds the input buffer for a further read;
if segind = 'seg1' then do;
input funnychars $ var1 $ var2 $var3 $;
output work.seg1;
end;
else if segind = 'seg2' then do;
input funnychars $ x2 x3 x4 x5;
output work.seg2;
end;
else if segind = 'seg3' then do;
input rnum $ bignum;
output work.seg3;
end;
[/pre]

But you get the idea. The '@' allows you to read something from the input buffer (like the seg #) and then decide which INPUT statement to use. If you needed to keep a group of SEG #s together, then you'd have to decide which variables belonged together and probably use RETAIN in order to assemble one observation from multiple SEG # data lines. The example snippet above would create one data set for every SEG #....so your DATA statement would look like:
[pre]
data seg1(keep=....)
seg2(keep=...)
seg3(keep=...);
[/pre]

cynthia

deleted_user · Posted 12-02-2008 03:00 PM

Cynthia,

I did exactly the same way that you said here above. I used @ to retain the data to be read after each segment read in. By somereason, by program works good for the data file which is saved as .txt. But is not working for .raw file, which is what we use internally to read in the raw data that comes from source.

Also '^' are internal separators in the variables. For ex: in case of names, we will be again separting First name, Last name middle intial etc.

thanks, Message was edited by: Sasbase

Cynthia_sas · Posted 12-02-2008 05:37 PM

Hi:
If the trailing @ is working correctly for an ASCII text file, but not for your "raw" file, then you need to investigate the difference in how the 2 files are created. If the "raw" file has carriage return/line feed issues, then you may need to get the creators of the raw file to make sure that the standard end of record/carriage return/line feed characters are put into the file.

BTW, this may seem like a fine point, but as an instructor, I have to make sure you understand that @ is not "retaining" data. @ is freezing the input pointer in the input buffer so you can go back and reread the input buffer with another INPUT statement -- another way to think about it is to say that @ is "holding" the input buffer open at the end of the INPUT statement. Usually you get one "read" of the input data file per INPUT statement. This is not what you want to do. You want your program to have several INPUT statements to read the same data line without having each INPUT statement cause a new data line to be read into the input buffer.

There's a difference between holding the input buffer for a subsequent INPUT statement and using the RETAIN statement to retain the value of a variable across executions of an INPUT or SET statement. If you continue to have issues reading the file, then you should consider contacting SAS Tech Support for more detailed help.

cynthia

Doc_Duke · Posted 12-02-2008 02:06 PM

If you have delimited files, you are looking at some detailed data step programming. There are several versions of the HL-7 standard and they have some differences in formatting so be sure the documentation matches the data.

As far as seeing one long data record, that may be a source data compatibility problem. Some operating systems do not separate "records" by a CR or a CR+LF-pair delimiter. You may need to work closer on the format with your data provider on more detailed formats.

Doc_Duke · Posted 12-02-2008 03:05 PM

SASbase,

Cynthia just gave you the shell of a program to read the data. It's what I did some years ago when I had to work with HL7 data. Once you've determined the record type (the seg #), you can use the usual SAS input statements to get the data in. As I recall, the person identifier is on one of the early seg's, so you'll need a RETAIN to save that for each output record.

Doc Muhlbaier
Duke

deleted_user · Posted 12-29-2008 02:58 PM

Cynthia,

Our problem to read in the raw data was resolved by keeping DOS end of Line character in the file.

Now my question is how to output to a file a few variables from each segment?

I have tried using retain and it is outputting as separate variables. But I need them to include those particular variables in one line separated by a comma or pipe.

thanks,
sasbase

Cynthia_sas · Posted 12-29-2008 04:48 PM

Hi:
Once you have your data into a SAS dataset, then your choices are to use ODS to create a comma separated file or to use a DATA step program (with FILE and PUT statements) to write a "flat file" from the SAS dataset (without using ODS)
[pre]

**1a) Use ODS CSV;
ods csv file='classout1a.csv';

proc print data=sashelp.class noobs;
var name age sex height;
run;

ods csv close;

**2) Use DATA _NULL_;
data _null_;
set sashelp.class;
p = '|';
file 'c:\temp\class_pipe.txt';
if _n_ = 1 then do;
put 'Name|Age|Sex|Height';
end;
put name p age p sex p height;
run;

[/pre]

You could also use PROC EXPORT and/or get an update to the CSV tagset that allows you to specify the delimiter that you want to use when you create your output file.

RETAIN will only be useful when you want to retain a variable value across multiple iterations of the data step. As, for example, you have some material on a SEG1 input record (like a name) and you want to retain that name value across all iterations of the "seg" input lines for that name. So I don't understand how RETAIN is doing any outputting for you.

It almost sounds like you want to read in data using INFILE/INPUT and then write it out again using FILE/PUT without ever having a SAS dataset for any other kind of analysis. If this is what you need to do, then you may want to work with Tech Support, as they can look at all your code and your data and help you with the most appropriate techniques.

cynthia

deleted_user · Posted 01-10-2009 02:38 PM

Cynthia,

Please see this data:

MSH|^~\&||.|||199908180016||ADT^A04|ADT.1.1698593|P|2.5
PID|1||000395122||LEVERKUHN^ADRIAN^C^^^||19880517180606|M|^^^^^||6 66TH AVE NE^^WEIMAR^DL^98052||(157)983-3296|||S||12354768|87654321
NK1|1|TALLIS^THOMAS^C|GRANDFATHER|12914 SPEM ST^^ALIUM^IN^98052|(157)883-6176
NK1|2|WEBERN^ANTON|SON|12 STRASSE MUSIK^^VIENNA^AUS^11212|(123)456-7890
PV1|1|E|EMG-W^^|1||||||||||ER||||ER||H|||||||||||||||||||OVL||REG|||199908180015
GT1|1||SMITH^JAMES^M||12914 164TH AVE NE^^RICHMOND^ON^98052|(157)883-6176|||||F|535-52-9776||||WEISS JENSON|.^^WELLINGTON^ON^.|(206)340-9577
IN1|1|PRE2|001|LIFE PRUDENT BUYER|PO BOX 23523^WELLINGTON^ON^98111|||19601||||||||THOMAS^JAMES^M|F|||||||||||||||||||ZKA535529776
ZLM|1|

I have my data as said given above and I wanter to take few of the variables from each segment like PID, PV1IN1 etc and write it out to a file, how to do in SAS.

Data is again repeated PID to IN1 segments and Data in these segments from PID to IN1 represent info of one person and make a complete record of that person.

Actually the way I am doing is: Read in the data using the first 3 bytes and write to different datasets and also write to a common dataset and the data from common dataset to a file usinf File Out statement. Data is not writing out properly and is all spread over. How this issue can be resolved?

thanks a lot in advance,
sasbase9

Cynthia_sas · Posted 01-11-2009 11:31 AM

Hi:
As I explained earlier -- the code that I posted was for the scenario where you WANTED to have each record type or "seg" dataline in a different data set.
If you need to have all the information for one person/employee/patient assembled together from multiple data lines into 1 observation, then the technique
I illustrated would NOT be the program you wanted.

Let's start with some simpler data. I have this data in a file called FAMDATA.TXT:
[pre]
e1,"Doug Jones",M,35
s1,"Mandy Jones",F,35
c1,"son",6,"David",M
c2,"daughter",3,"Melissa",F
a1,"1234 Some St.","Dallas","TX",75020
e1,"Anne Austin",F,37
s1,"Jack Austin",M,37
c1,"step-daughter",10,"Andrea",F
c2,"daughter",8,"Jeanine",F
e1,"Bud Hollis",M,37
s1,"Suzie Queue",F,36
c1,"step-son",10,"Sam",M
a1,"3456 Cricket Lane","Dallas","TX",75021
[/pre]

Note that I have 3 families -- the Jones, Austin and Hollis families. In this data, the 'e1' data line is the employee record; the 's1' dataline is the spouse record;
the 'c1', 'c2' lines are for the child records (in my data, people can have 0, 1 or 2 children); finally, the 'a1' data line holds the street, city, state and zip code.

So, I am going to need a different INPUT statement for each record indicator, and I am not going to have a whole family assembled until I either read
the 'a1' record for the family or I read the 'e1' record (I know that Mr. Jones family observation is ready to output when I read the Austin 'e1' record.) In my
data, it is possible to have an employee in the file without an address (as with the Austin family), so I cannot use the output on 'a1' logic in my program.

From these individual records, I want to assemble a single observation for each employee in a SAS dataset. The assembled record will have these variables:
employee emp_gender emp_age (from e1)
spouse sp_gender sp_age (from s1)
child1 gender1 rel1 age1 (from c1)
child2 gender2 rel2 age2 (from c2) street city state zip (from a1)

If any of the record indicators are not found (no child 2 or no address or no spouse), then the variables need to be set to missing. So, for the above
FAMDATA.TXT file, I want to assemble this final SAS dataset:
[pre]
use trailing @ and conditional output to assemble the whole obs from multiple input lines

emp_
Obs employee gender emp_age spouse sp_gender sp_age child1 gender1 rel1 age1 child2 gender2 rel2 age2 street city state zip

1 Doug Jones M 35 Mandy Jones F 35 David M son 6 Melissa F daughter 3 1234 Some St. Dallas TX 75020
2 Anne Austin F 37 Jack Austin M 37 Andrea F step-daughter 10 Jeanine F daughter 8
3 Bud Hollis M 37 Suzie Queue F 36 Sam M step-son 10 . 3456 Cricket Lane Dallas TX 75021
[/pre]

In order for this final observation to be assembled, I needed to explicitly RETAIN the employee and spouse and child information every time I read a specific
record indicator in order to make sure that I had all the information available when I wrote out the final "assembled" observation.

Think of it like this. SAS can only handle one raw data line at a time. So your program has one chance to "catch and save" the information as you are
holding the record in the input buffer. The INPUT statement does the reading from the buffer. That's the "catching" or reading the information from the raw
data line. But the RETAIN statement does the "saving" of the information until you're ready to do the OUTPUT.

Usually, with a simpler set of raw data, you have one OUTPUT (implied) at the end of the DATA step program for every INPUT statement. But in the program
that's needed to read the FAMDATA.TXT data, I only have one OUTPUT statment (explicit) for every group of INPUT statements. And, then I have one final
OUTPUT statement for when I read the end of the raw data file.

The SAS program that read FAMDATA.TXT and produced the above output is shown below. You may need to really study how my program is working
before you can change your program successfully. I suggest you start with some very SIMPLE data. Try the program with and without RETAIN.
Experiment with having the OUTPUT statement in different places and see what happens. Until you understand the DATA step and how SAS processes
raw data and how the RETAIN statement works, you will have problems writing a program to read your HL7 data successfully. You can always contact
Tech Support for more one-on-one help with your specific data and your specific program.

cynthia
[pre]
data allrec
(keep=employee emp_gender emp_age spouse sp_gender sp_age
child1 gender1 rel1 age1
child2 gender2 rel2 age2
street city state zip);
length employee $20 emp_gender $1 emp_age 8
spouse $20 sp_gender $1 sp_age 8
child1 $20 gender1 $1 rel1 $15 age1 8
child2 $20 gender2 $1 rel2 $15 age2 8
street $20 city $15 state $2 zip $5;

retain employee emp_gender emp_age spouse sp_gender sp_age
child1 gender1 rel1 age1
child2 gender2 rel2 age2
street city state zip;

infile 'c:\temp\famdata.txt' dsd dlm = ',' end=eof;
** read in record indicator and hold the line with trailing @;
input recind $ @;

** now, read in the other data lines, depending on the record indicator value;
if recind = 'e1' then do;
if _n_ gt 1 then do;
** this OUTPUT is for ALL the employee information except the last employee.;
** also, do NOT want to output when _N_= 1 because we have only read the first employee data line;
** at this point and we want to read the rest of his info before doing the output.;
** the logic is that every time we read an "e1" data line, we are reading in a NEW employee and must;
** output the info for the PREVIOUS employee.;
output;

** initialize all variables to missing to erase previously retained information;
emp_gender=' '; sp_gender = ' '; emp_age = .; sp_age = .;
age1=.; age2=.; gender1=' '; gender2 = ' '; rel1 = ' '; rel2 = ' ';
child1 = ' '; child2 = ' '; spouse = ' '; street=' '; city=' '; state = ' '; zip = ' ';
end;
** read in the employee record;
input employee $ emp_gender $ emp_age;
end;
else if recind = 's1' then do;
input spouse $ sp_gender $ sp_age;
end;
else if recind = 'c1' then do;
input rel1 $ age1 child1 $ gender1 $;
end;
else if recind = 'c2' then do;
input rel2 $ age2 child2 $ gender2 $;
end;
else if recind = 'a1' then do;
input street $ city $ state $ zip $;
end;
if eof = 1 then do;
** output LAST assembled obs for last employee because have reached end of file;
output;
end;
run;

options nodate nonumber nocenter ls=200;
proc print data=allrec;
title 'use trailing @ and conditional output to assemble the whole obs from multiple input lines';
run;

[/pre]

deleted_user · Posted 01-12-2009 01:04 PM

Thanks Cynthia. This worked for me. You are gr8.

regards,
sasbase

deleted_user · Posted 06-26-2009 11:20 AM

Hi Cyndy,

I need you help again wrt to this layout.

If any of the segments are repeating within for few of the records, how to get a hold of values for repeated segments??

For ex:

e1|abc|1123|09889
a1|Dan|Peter|||
dx1|1234|1
dx1|33455|2
dx1|34556|3
c1|pete|||
e1|

thanks,
sasbase9

sbb · Posted 06-26-2009 11:39 AM

Consider that this is a general forum with individuals other than Cynthia.

Also, what's unclear is the desired output result - please list both INPUT and OUTPUT (desired) data samples, for a meaningful reply by any of the subscribers on the forum. I suspect you're interested in how to handle the "dx1" data (segment) strings, correct? On the input side, these appear as separate records/segments -- how do you expect them to be represented, and how are they being represented with the current SAS program you are using (suggestion to post what code you have working today).

As a courtesy, also, consider creating a new post next time with a reference to the prior post (hyperlink pasted in the new forum post works), given that it's pretty much a new question/query.

Scott Barry
SBBWorks, Inc.

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away