Solved: Re: Computing a new variable based on character range

bretthouston · Posted 12-20-2019 12:31 PM

Hello,

I am trying to compute a new variable based on the character values in the 6th and 7th position of an existing column. For example, I'd like to try to compute the column Newcode from CCIcode:

PatientID CCIcode Newcode

1 1GR89QB 1GR89O

2 1AB76EA 1AB76E

3 1CG57KZ 1CG57O

4 1CG57KS 1CG57E

Essentially, if position 6 & 7 of the variable CCIcode are AA-KS, then I'd like position 6 of Newcode to be "E". If position 6 & 7 of the variable CCIcode are KZ-XY (there is nothing inbetween KS to KZ) then I'd like position 6 of Newcode to be "O".

I've tried a few variations of substr code, but I don't know how to specify the character ranges without having to type the letters individually. Any help here would be much much appreciated!

Thanks!

PeterClemmensen · Posted 12-20-2019 12:36 PM

How is the range AA-KS defined?

The DATA to DATA Step Macro
Blog: SASnrd

View solution in original post

PeterClemmensen · Posted 12-20-2019 12:36 PM

How is the range AA-KS defined?

The DATA to DATA Step Macro
Blog: SASnrd

novinosrin · Posted 12-20-2019 12:43 PM


data have;
input PatientID   CCIcode :$15.;
cards;
1                1GR89QB    1GR89O
2                1AB76EA     1AB76E
3                1CG57KZ    1CG57O
4                1CG57KS    1CG57E
;

data want;
set have;
k=substr(CCIcode,6,2);
if  'AA'<=k<='KS' then newcode=cats(substr(CCIcode,1,5),'E');
else if 'KZ'<=k<='XY' then newcode=cats(substr(CCIcode,1,5),'O');
drop k;
run;

bretthouston · Posted 12-20-2019 12:47 PM

Thank-you so much - this is exactly what I was looking for! Works perfectly!

ed_sas_member · Posted 12-20-2019 12:57 PM

Hi @bretthouston

You can also use a PRX function to look for specific patterns:

data want;
	set have;
	if prxmatch('/^.{5}[A-J][A-Z]$|^.{5}K[A-S]$/',strip(CCIcode))
		then Newcode = prxchange('s/[A-J][A-Z]$|K[A-S]$/E/',1,strip(CCIcode));
	else if prxmatch('/^.{5}KZ$|^.{5}[L-W][A-Z]$|^.{5}X[A-Y]$/',strip(CCIcode))
		then Newcode = prxchange('s/KZ$|[L-W][A-Z]$|X[A-Y]$/O/',1,strip(CCIcode));
run;

Computing a new variable based on character range

Re: Computing a new variable based on character range

Re: Computing a new variable based on character range

Re: Computing a new variable based on character range

Re: Computing a new variable based on character range

Re: Computing a new variable based on character range

Registration is open