Hi!
I am trying to use an array to efficiently create a set of new variables. But SAS doesn't seem to accept an array reference as the second argument in the substr function. Is there another way to do this without having to just type separate code for each of the new variables?
My data consists of two character string variables of letter codes indicating a status over 48 periods. I want to know the letter code from string1 that corresponds to the start of each instance of at least 3 repeated "I"s from string2. I have already used some regex coding to determine the position of each of those instances (intpos1-intpos4). Now I would like to create a new set of variables (intcode1-intcode4) that extracts the character from string1 that is at each of those same positions. If intposX is missing then the corresponding intcodeX should also be missing.
Obviously with only 4 new variables, typing out the code for each would not be that onerous, but I would like to do it as elegantly as possible, and learn something in the process. (4 is the current maximum instance of the recurring pattern in my data, but that could increase with new data)
I'm using SAS 9.4 through SAS EG 7.12.
data work.test;
set work.test;
array intcode $1. intcode1-intcode4;
array intpos intpos1-intpos4;
do b=1 to 4;
intcode(b)=substr(string1,intpos(b),1); /* generates error for invalid second argument*/
end;
run;
Data:
data WORK.test;
infile datalines dsd truncover;
input string1:$48. string2:$48. intpos1:8. intpos2:8. intpos3:8. intpos4:8.;
datalines;
FAAAAFFFFFFFFFFFUUUUUFFFFFFFFFFFFFFFFFFFFFFFFFFF FIIIIFFFFFFFFFFFIIIIIFFFFFFFFFFFFFFFFFFFFFFFFFFF 2 17 .
777777777777777777777777NNNNNNNNNSPPPPPNNNNNNNNN 777777777777777777777777IIIIIIIIIIPPPPPIIIIIIIII 25 40 . .
NNNNNNNNNNNNNNNNNNPPPPPPFFFFFFFFFFFFFFFFFFFFFFFF IIIIIIIIIIIIIIIIIIPPPPPPFFFFFFFFFFFFFFFFFFFFFFFF 1 . . .
SSSSSSSNNSSSSSSSSSSSSSSSNNNNNNNNNNNNNNNNNNNNNNNN IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 1 . . .
NNNNNNNNNNNNNNNNNNNNPPPPPPPPPPPPPPPPPPPPPPPPPPPP IIIIIIIIIIIIIIIIIIIIPPPPPPPPPPPPPPPPPPPPPPPPPPPP 1 . . .
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 1 . . .
PPPPPPPPPNNNNNNNNNNNNNNNAAAAAAAAAAAAAAAAAAAAAAAA PPPPPPPPPIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 10 . . .
NNNNNNNNNFFFFFFFFFFFRRRRRRRRRRRRRRRRRRRRRRRRRRRR IIIIIIIIIFFFFFFFFFFFRRRRRRRRRRRRRRRRRRRRRRRRRRRR 1 . . .
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFAAAF FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFIIIF 45 . . .
AAAFFFFFFFFFAAAFFFFFFFFFNSSSFFFFFFFNNSSSFFFFFFFN IIIFFFFFFFFFIIIFFFFFFFFFIIIIFFFFFFFIIIIIFFFFFFFI 1 13 25 36
FFFFFFFFFFFUUUUFFFFFFFUUFFFFFFFFFFSSNNNNFFFFFFFN FFFFFFFFFFFIIIIFFFFFFFIIFFFFFFFFFFIIIIIIFFFFFFFI 12 35 . .
PPPPPPPPNNNNFFFFFFFFFFFFPPPPPPPPPPPPPPPPPPPPNSSN PPPPPPPPIIIIFFFFFFFFFFFFPPPPPPPPPPPPPPPPPPPPIIII 9 45 . .
;;;;;;;;;;;
... View more