Hi,
I am dealing with a dataset which has a field containing a so-called EDIFACT message. It menas a long streng containing different inormation. I want to parse and divide the long text string into sas variables. This should be based on char string characters - se example below.
Any ideas or code to a nice and easy solution from SAS???
Thanks in advance,
Input data: one long string
NAD+PERSONnumber+3456789010'DTM+091:170126:101'NAD+ENTITY+027'NAD+KLA+11' NAD+YNR+'NAD+MOK+405'TAF+55:45:53:52:51:61:62:63:64:65:85:84:83:82:81:71:72:73:74:75' TOB+85:OKK:0'TOB+75:OKK:0’TOB+63:MES:4'QTY+DYB:X'
I want to create these new variables:
PERSONnumber=3456789010 DTM=091:170126 ENTITY=027 KLA=11 YNR=+ MOK=405 TAF=55:45:53:52:51:61:62:63:64:65:85:84:83:82:81:71:72:73:74:75' TOB85=OKK:0 TOB75=OKK:0 TOB63= MES:4'QTY+DYB:X'
Looks to me like you can get very close by
data test ;
str=
"NAD+PERSONnumber+3456789010'DTM+091:170126:101'"
||"NAD+ENTITY+027'NAD+KLA+11'"
||"NAD+YNR+'NAD+MOK+405'"
||"TAF+55:45:53:52:51:61:62:63:64:65:85:84:83:82:81:71:72:73:74:75'"
||"TOB+85:OKK:0'TOB+75:OKK:0.TOB+63:MES:4'QTY+DYB:X'"
;
length i 8 term $100 ;
* Remove NAD+ and then Split on ticks and periods ;
str=compress(tranwrd(str,'NAD+',' '),' ');
do i=1 by 1 until (term=' ');
term=scan(str,i,"'.");
output;
put i= term= ;
end;
drop str;
run;
i=1 term=PERSONnumber+3456789010 i=2 term=DTM+091:170126:101 i=3 term=ENTITY+027 i=4 term=KLA+11 i=5 term=YNR+ i=6 term=MOK+405 i=7 term=TAF+55:45:53:52:51:61:62:63:64:65:85:84:83:82:81:71:72:73:74:75 i=8 term=TOB+85:OKK:0 i=9 term=TOB+75:OKK:0 i=10 term=TOB+63:MES:4 i=11 term=QTY+DYB:X i=12 term=
Try SCAN() with the + and ' as your delimiters.
Looks to me like you can get very close by
data test ;
str=
"NAD+PERSONnumber+3456789010'DTM+091:170126:101'"
||"NAD+ENTITY+027'NAD+KLA+11'"
||"NAD+YNR+'NAD+MOK+405'"
||"TAF+55:45:53:52:51:61:62:63:64:65:85:84:83:82:81:71:72:73:74:75'"
||"TOB+85:OKK:0'TOB+75:OKK:0.TOB+63:MES:4'QTY+DYB:X'"
;
length i 8 term $100 ;
* Remove NAD+ and then Split on ticks and periods ;
str=compress(tranwrd(str,'NAD+',' '),' ');
do i=1 by 1 until (term=' ');
term=scan(str,i,"'.");
output;
put i= term= ;
end;
drop str;
run;
i=1 term=PERSONnumber+3456789010 i=2 term=DTM+091:170126:101 i=3 term=ENTITY+027 i=4 term=KLA+11 i=5 term=YNR+ i=6 term=MOK+405 i=7 term=TAF+55:45:53:52:51:61:62:63:64:65:85:84:83:82:81:71:72:73:74:75 i=8 term=TOB+85:OKK:0 i=9 term=TOB+75:OKK:0 i=10 term=TOB+63:MES:4 i=11 term=QTY+DYB:X i=12 term=
I used this solution. This works very well. Thanks a lot for your input.
I used this solution. This works very well. Thanks a lot for your input.
if you values are pretty fixed here is the another approach. In first step where we use prxchange what we do is
we pick the pattern and extract what we want
for example for variable person number as shown below code snippet
PERSONnumber =prxchange('s/.+?PERSONnumber(.+?)''.+/$1/i', -1, str);
for example for variable person number as code shown above
(.+?) is the value we capture and contains anything after personnumber and before first single quote.
This captured value is $1 and replaces everything else. Same approach for every other variable. In the next step + is not replaced, when you have only +, otherwise + is removed. Hope I am clear in my explanation
data test ;
str=
"NAD+PERSONnumber+3456789010'DTM+091:170126:101'NAD+ENTITY+027'NAD+KLA+11'
NAD+YNR+'NAD+MOK+405'TAF+55:45:53:52:51:61:62:63:64:65:85:84:83:82:81:71:72:73:74:75'
TOB+85:OKK:0'TOB+75:OKK:0'TOB+63:MES:4'QTY+DYB:X'"
;
data test2(drop =str);
set test;
PERSONnumber =prxchange('s/.+?PERSONnumber(.+?)''.+/$1/i', -1, str);
DTM =prxchange('s/.+?DTM(.+?)''.+/$1/i', -1, str);
ENTITY=prxchange('s/.+?ENTITY(.+?)''.+/$1/i', -1, str);
KLA=prxchange('s/.+?KLA(.+?)''.+/$1/i', -1, str);
YNR=prxchange('s/.+?YNR(.+?)''.+/$1/i', -1, str);
MOK=prxchange('s/.+?MOK(.+?)''.+/$1/i', -1, str);
TAF=prxchange('s/.+?TAF(.+?)''.+/$1/i', -1, str);
TOB85=prxchange('s/.+?TOB\+85\:(.+?)''.+/$1/i', -1, str);
TOB75=prxchange('s/.+?TOB\+75\:(.+?)''.+/$1/i', -1, str);
TOB63=prxchange('s/.+?TOB\+63\:(.+)/$1/i', -1, str);
run;
data test3;
set test2;
array vars{*} _character_;
do i=1 to dim(vars);
if vars{i} = '+' then vars{i} = vars{i};
else vars{i} = substr(vars{i},2);
end;
run;
Ny values are NOT fixed. The styring and fontene is variable. So somehow i ned to find out hos many variables there is.
Any smart Way of doping that?
I Will try the solutins latter. Looks vers interesting.
Thanks.
The fontene and nummer and nane of variables Will be different. So somehow i ned to use the test separators in the original styring to James the outputtet variables in ny final dataset.
Instead of trying to write your own parser you could also investigate the Internet if there is already something out there doing this job for you - and then spend the time to figure out how to interface with such a parser.
Doing a quick Internet search it appears such parsers exists - ie. converting EDIFACT to XML. If so then you could use the SAS XMLV2 engine together with automap to then read such an XML file.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.