Solved: Convert long string into variable EDIFACT

ANLYNG · Posted 10-07-2017 12:27 PM

Hi,

I am dealing with a dataset which has a field containing a so-called EDIFACT message. It menas a long streng containing different inormation. I want to parse and divide the long text string into sas variables. This should be based on char string characters - se example below.

Any ideas or code to a nice and easy solution from SAS???

Thanks in advance,

Input data: one long string

NAD+PERSONnumber+3456789010'DTM+091:170126:101'NAD+ENTITY+027'NAD+KLA+11'
NAD+YNR+'NAD+MOK+405'TAF+55:45:53:52:51:61:62:63:64:65:85:84:83:82:81:71:72:73:74:75'
TOB+85:OKK:0'TOB+75:OKK:0’TOB+63:MES:4'QTY+DYB:X'

I want to create these new variables:

PERSONnumber=3456789010
DTM=091:170126
ENTITY=027
KLA=11
YNR=+
MOK=405
TAF=55:45:53:52:51:61:62:63:64:65:85:84:83:82:81:71:72:73:74:75'
TOB85=OKK:0
TOB75=OKK:0
TOB63= MES:4'QTY+DYB:X'

Tom · Posted 10-07-2017 02:15 PM

Looks to me like you can get very close by

Removing the NAD+ from the input string.
Splitting the string on single quote and period

data test ;
  str=
   "NAD+PERSONnumber+3456789010'DTM+091:170126:101'"
||"NAD+ENTITY+027'NAD+KLA+11'"
||"NAD+YNR+'NAD+MOK+405'"
||"TAF+55:45:53:52:51:61:62:63:64:65:85:84:83:82:81:71:72:73:74:75'"
||"TOB+85:OKK:0'TOB+75:OKK:0.TOB+63:MES:4'QTY+DYB:X'"
  ;
  length i 8 term $100 ;
 * Remove NAD+ and then Split on ticks and periods ;
  str=compress(tranwrd(str,'NAD+',' '),' ');
  do i=1 by 1 until (term=' ');
    term=scan(str,i,"'.");
    output;
    put i= term= ;
  end;
  drop str;
run;

i=1 term=PERSONnumber+3456789010
i=2 term=DTM+091:170126:101
i=3 term=ENTITY+027
i=4 term=KLA+11
i=5 term=YNR+
i=6 term=MOK+405
i=7 term=TAF+55:45:53:52:51:61:62:63:64:65:85:84:83:82:81:71:72:73:74:75
i=8 term=TOB+85:OKK:0
i=9 term=TOB+75:OKK:0
i=10 term=TOB+63:MES:4
i=11 term=QTY+DYB:X
i=12 term=

View solution in original post

Reeza · Posted 10-07-2017 01:47 PM

Try SCAN() with the + and ' as your delimiters.

Tom · Posted 10-07-2017 02:15 PM

Looks to me like you can get very close by

Removing the NAD+ from the input string.
Splitting the string on single quote and period

data test ;
  str=
   "NAD+PERSONnumber+3456789010'DTM+091:170126:101'"
||"NAD+ENTITY+027'NAD+KLA+11'"
||"NAD+YNR+'NAD+MOK+405'"
||"TAF+55:45:53:52:51:61:62:63:64:65:85:84:83:82:81:71:72:73:74:75'"
||"TOB+85:OKK:0'TOB+75:OKK:0.TOB+63:MES:4'QTY+DYB:X'"
  ;
  length i 8 term $100 ;
 * Remove NAD+ and then Split on ticks and periods ;
  str=compress(tranwrd(str,'NAD+',' '),' ');
  do i=1 by 1 until (term=' ');
    term=scan(str,i,"'.");
    output;
    put i= term= ;
  end;
  drop str;
run;

i=1 term=PERSONnumber+3456789010
i=2 term=DTM+091:170126:101
i=3 term=ENTITY+027
i=4 term=KLA+11
i=5 term=YNR+
i=6 term=MOK+405
i=7 term=TAF+55:45:53:52:51:61:62:63:64:65:85:84:83:82:81:71:72:73:74:75
i=8 term=TOB+85:OKK:0
i=9 term=TOB+75:OKK:0
i=10 term=TOB+63:MES:4
i=11 term=QTY+DYB:X
i=12 term=

ANLYNG · Posted 10-11-2017 02:49 PM

I used this solution. This works very well. Thanks a lot for your input.

ANLYNG · Posted 10-11-2017 02:52 PM

I used this solution. This works very well. Thanks a lot for your input.

ANLYNG · Posted 10-11-2017 02:54 PM

I used this solution. This works very well. Thanks a lot for your input.

kiranv_ · Posted 10-07-2017 04:13 PM

if you values are pretty fixed here is the another approach. In first step where we use prxchange what we do is

we pick the pattern and extract what we want

for example for variable person number as shown below code snippet

PERSONnumber =prxchange('s/.+?PERSONnumber(.+?)''.+/$1/i', -1, str);

for example for variable person number as code shown above

(.+?) is the value we capture and contains anything after personnumber and before first single quote.

This captured value is $1 and replaces everything else. Same approach for every other variable. In the next step + is not replaced, when you have only +, otherwise + is removed. Hope I am clear in my explanation

data test ;
  str=
"NAD+PERSONnumber+3456789010'DTM+091:170126:101'NAD+ENTITY+027'NAD+KLA+11'
NAD+YNR+'NAD+MOK+405'TAF+55:45:53:52:51:61:62:63:64:65:85:84:83:82:81:71:72:73:74:75'
TOB+85:OKK:0'TOB+75:OKK:0'TOB+63:MES:4'QTY+DYB:X'"
  ;
  
  
  data test2(drop =str);
  set test;
  PERSONnumber =prxchange('s/.+?PERSONnumber(.+?)''.+/$1/i', -1, str);
  DTM =prxchange('s/.+?DTM(.+?)''.+/$1/i', -1, str);
  ENTITY=prxchange('s/.+?ENTITY(.+?)''.+/$1/i', -1, str);
  KLA=prxchange('s/.+?KLA(.+?)''.+/$1/i', -1, str);
  YNR=prxchange('s/.+?YNR(.+?)''.+/$1/i', -1, str);
  MOK=prxchange('s/.+?MOK(.+?)''.+/$1/i', -1, str);
  TAF=prxchange('s/.+?TAF(.+?)''.+/$1/i', -1, str);
  TOB85=prxchange('s/.+?TOB\+85\:(.+?)''.+/$1/i', -1, str);
  TOB75=prxchange('s/.+?TOB\+75\:(.+?)''.+/$1/i', -1, str);
  TOB63=prxchange('s/.+?TOB\+63\:(.+)/$1/i', -1, str);
  run;
  
  data test3;
  set test2;
 array vars{*} _character_;
 
   do i=1 to dim(vars);
     if vars{i} = '+' then vars{i} = vars{i};
     	else vars{i}  = substr(vars{i},2);
   end;
run;

ANLYNG · Posted 10-08-2017 04:16 AM

Ny values are NOT fixed. The styring and fontene is variable. So somehow i ned to find out hos many variables there is.

Any smart Way of doping that?

I Will try the solutins latter. Looks vers interesting.

Thanks.

ANLYNG · Posted 10-08-2017 04:20 AM

The fontene and nummer and nane of variables Will be different. So somehow i ned to use the test separators in the original styring to James the outputtet variables in ny final dataset.

Patrick · Posted 10-08-2017 09:07 PM

@ANLYNG

Instead of trying to write your own parser you could also investigate the Internet if there is already something out there doing this job for you - and then spend the time to figure out how to interface with such a parser.

Doing a quick Internet search it appears such parsers exists - ie. converting EDIFACT to XML. If so then you could use the SAS XMLV2 engine together with automap to then read such an XML file.

Convert long string into variable EDIFACT

Re: Convert long string into variable EDIFACT

Re: Convert long string into variable EDIFACT

Re: Convert long string into variable EDIFACT

Re: Convert long string into variable EDIFACT

Re: Convert long string into variable EDIFACT

Re: Convert long string into variable EDIFACT

Re: Convert long string into variable EDIFACT

Re: Convert long string into variable EDIFACT

Re: Convert long string into variable EDIFACT

Re: Convert long string into variable EDIFACT

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away