BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ANLYNG
Pyrite | Level 9

Hi,

I am dealing with a dataset which has a field containing a so-called EDIFACT message. It menas a long streng containing different inormation. I want to parse and divide the long text string into sas variables. This should be based on char string characters - se example below.

 

Any ideas or code to a nice and easy solution from SAS???

 

Thanks in advance,

 

 Input data: one long string

 

NAD+PERSONnumber+3456789010'DTM+091:170126:101'NAD+ENTITY+027'NAD+KLA+11'
NAD+YNR+'NAD+MOK+405'TAF+55:45:53:52:51:61:62:63:64:65:85:84:83:82:81:71:72:73:74:75'
TOB+85:OKK:0'TOB+75:OKK:0’TOB+63:MES:4'QTY+DYB:X'

 

I want to create these new variables:

PERSONnumber=3456789010
DTM=091:170126
ENTITY=027
KLA=11
YNR=+
MOK=405
TAF=55:45:53:52:51:61:62:63:64:65:85:84:83:82:81:71:72:73:74:75'
TOB85=OKK:0
TOB75=OKK:0
TOB63= MES:4'QTY+DYB:X'

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Looks to me like you can get very close by 

  1. Removing the NAD+ from the input string.
  2. Splitting the string on single quote and period
data test ;
  str=
   "NAD+PERSONnumber+3456789010'DTM+091:170126:101'"
||"NAD+ENTITY+027'NAD+KLA+11'"
||"NAD+YNR+'NAD+MOK+405'"
||"TAF+55:45:53:52:51:61:62:63:64:65:85:84:83:82:81:71:72:73:74:75'"
||"TOB+85:OKK:0'TOB+75:OKK:0.TOB+63:MES:4'QTY+DYB:X'"
  ;
  length i 8 term $100 ;
 * Remove NAD+ and then Split on ticks and periods ;
  str=compress(tranwrd(str,'NAD+',' '),' ');
  do i=1 by 1 until (term=' ');
    term=scan(str,i,"'.");
    output;
    put i= term= ;
  end;
  drop str;
run;
i=1 term=PERSONnumber+3456789010
i=2 term=DTM+091:170126:101
i=3 term=ENTITY+027
i=4 term=KLA+11
i=5 term=YNR+
i=6 term=MOK+405
i=7 term=TAF+55:45:53:52:51:61:62:63:64:65:85:84:83:82:81:71:72:73:74:75
i=8 term=TOB+85:OKK:0
i=9 term=TOB+75:OKK:0
i=10 term=TOB+63:MES:4
i=11 term=QTY+DYB:X
i=12 term=

View solution in original post

9 REPLIES 9
Reeza
Super User

Try SCAN() with the + and ' as your delimiters. 

Tom
Super User Tom
Super User

Looks to me like you can get very close by 

  1. Removing the NAD+ from the input string.
  2. Splitting the string on single quote and period
data test ;
  str=
   "NAD+PERSONnumber+3456789010'DTM+091:170126:101'"
||"NAD+ENTITY+027'NAD+KLA+11'"
||"NAD+YNR+'NAD+MOK+405'"
||"TAF+55:45:53:52:51:61:62:63:64:65:85:84:83:82:81:71:72:73:74:75'"
||"TOB+85:OKK:0'TOB+75:OKK:0.TOB+63:MES:4'QTY+DYB:X'"
  ;
  length i 8 term $100 ;
 * Remove NAD+ and then Split on ticks and periods ;
  str=compress(tranwrd(str,'NAD+',' '),' ');
  do i=1 by 1 until (term=' ');
    term=scan(str,i,"'.");
    output;
    put i= term= ;
  end;
  drop str;
run;
i=1 term=PERSONnumber+3456789010
i=2 term=DTM+091:170126:101
i=3 term=ENTITY+027
i=4 term=KLA+11
i=5 term=YNR+
i=6 term=MOK+405
i=7 term=TAF+55:45:53:52:51:61:62:63:64:65:85:84:83:82:81:71:72:73:74:75
i=8 term=TOB+85:OKK:0
i=9 term=TOB+75:OKK:0
i=10 term=TOB+63:MES:4
i=11 term=QTY+DYB:X
i=12 term=
ANLYNG
Pyrite | Level 9

I used this solution. This works very well. Thanks a lot for your input.

 

ANLYNG
Pyrite | Level 9

I used this solution. This works very well. Thanks a lot for your input.

ANLYNG
Pyrite | Level 9
I used this solution. This works very well. Thanks a lot for your input.
kiranv_
Rhodochrosite | Level 12

if you values are pretty fixed here is the another approach. In first step  where we use prxchange what we do is

we pick the pattern and extract what we want

for example for variable person number as shown below code snippet

 

PERSONnumber =prxchange('s/.+?PERSONnumber(.+?)''.+/$1/i', -1, str);

 

for example for variable person number as code shown above

(.+?) is the value we capture and contains anything after personnumber  and before first single quote.

This captured value is  $1 and replaces everything else. Same approach for every other variable. In the next step + is not replaced, when you have only +, otherwise + is removed. Hope I am clear in my explanation

 

 

 

data test ;
  str=
"NAD+PERSONnumber+3456789010'DTM+091:170126:101'NAD+ENTITY+027'NAD+KLA+11'
NAD+YNR+'NAD+MOK+405'TAF+55:45:53:52:51:61:62:63:64:65:85:84:83:82:81:71:72:73:74:75'
TOB+85:OKK:0'TOB+75:OKK:0'TOB+63:MES:4'QTY+DYB:X'"
  ;
  
  
  data test2(drop =str);
  set test;
  PERSONnumber =prxchange('s/.+?PERSONnumber(.+?)''.+/$1/i', -1, str);
  DTM =prxchange('s/.+?DTM(.+?)''.+/$1/i', -1, str);
  ENTITY=prxchange('s/.+?ENTITY(.+?)''.+/$1/i', -1, str);
  KLA=prxchange('s/.+?KLA(.+?)''.+/$1/i', -1, str);
  YNR=prxchange('s/.+?YNR(.+?)''.+/$1/i', -1, str);
  MOK=prxchange('s/.+?MOK(.+?)''.+/$1/i', -1, str);
  TAF=prxchange('s/.+?TAF(.+?)''.+/$1/i', -1, str);
  TOB85=prxchange('s/.+?TOB\+85\:(.+?)''.+/$1/i', -1, str);
  TOB75=prxchange('s/.+?TOB\+75\:(.+?)''.+/$1/i', -1, str);
  TOB63=prxchange('s/.+?TOB\+63\:(.+)/$1/i', -1, str);
  run;
  
  data test3;
  set test2;
 array vars{*} _character_;
 
   do i=1 to dim(vars);
     if vars{i} = '+' then vars{i} = vars{i};
     	else vars{i}  = substr(vars{i},2);
   end;
run;

 

 

ANLYNG
Pyrite | Level 9

Ny values are NOT fixed. The styring and fontene is variable. So somehow i ned to find out hos many variables there is. 

 

Any smart Way of doping that? 

 

I Will try the solutins latter. Looks vers interesting. 

 

Thanks. 

ANLYNG
Pyrite | Level 9

The fontene and nummer and nane of variables Will be different. So somehow i ned to use the test separators in the original styring to James the outputtet variables in ny final dataset. 

Patrick
Opal | Level 21

@ANLYNG

Instead of trying to write your own parser you could also investigate the Internet if there is already something out there doing this job for you - and then spend the time to figure out how to interface with such a parser.

Doing a quick Internet search it appears such parsers exists - ie. converting EDIFACT to XML. If so then you could use the SAS XMLV2 engine together with automap to then read such an XML file.  

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 1959 views
  • 0 likes
  • 5 in conversation