Hello,
Please find below data Have and data Want.
Data Have ; input data $100. ; cards ; data/dataflow/BPS.STQR/EZR/1.0/RER_QWE_MED.D.AUD.MED, RER_QWA_MED.D.CAD.WOW.WOW.OF00, QWZ_PLO_WOW.D.POP ; Run ;
Data Want ; input x1 $60. ; cards ; RER_QWE_MED , RER_QWA_MED , QWZ_PLO_WOW ; Run ;
We would like to extract X1 from the data according to the following rules for Type A and Type B:
Type A
1. it appears after the last back slash in the sentence
2. and also before the first dot following this word (if there is a dot)
3. and this word contains only capital letters or underscore in it (without small letters)
4. if such a word is not found write 'NOWORD'
Type B
here we have to extract more than one word and the logic is as follows:
1. it appears after the last back slash in the sentence
2. continue in the same row until we find a comma and then take the word which is between the comma and the next first dot.
recursively look for all the words between comma and next dot (as long as we are after the last backslash in the row).
the purpose is to clean relevant words from garbage, for example '.CAD.' is garbage.
for type A we have the solution
x1 = scan(scan(data,-1,"/"),1,'.') ; *after last back slash and before first period ;
if count(x1,lowcase(x1)) = 1 then x1 = 'NOWORD' ;
We need a solution for type B
best solution would handle both type A and type B in the same row.
------------------------------------------------------
BTW Next step we split the want data by delims.
Data Audit_Split ;
set Audit_short ;
delims = '+,' ;
Array s_ [40] $55 s1-s40 ;
do i = 1 to 40 ;
s_[i] = left(scan(string, i, delims)) ;
end ;
drop delims i ;
Run ;
----------------------------------------------------------------------------------------------------------
Thanks in advance.
... View more