I have a dataset with a character variable called "oldvar" that contains a string of numeric values such as "4 6 6 4 5". I would like to create a new variable that has one of the "4" values removed. For example, I would like my new variable to be "6 6 4 5" with the first 4 deleted or alternatively the new variable could be "4 6 6 5". Either one would work for me. I've tried using a combination of the substr and index functions but none have produced what I'm looking for.
Example of values that "oldvar" takes on:
4 5 4 5
4 6 4
6 6 4 5
3 4 3
Thank you in advance for any help!
Here's another approach just for options.
data have;
input oldvar $20.;
datalines;
4 5 4 5
4 6 4
6 6 4 5
3 4 3
;
run;
data want;
set have;
newvar= prxchange('s/4 //', 1, oldvar);
run;
This yields
The SAS System | |
oldvar | newvar |
4 5 4 5 | 5 4 5 |
4 6 4 | 6 4 |
6 6 4 5 | 6 6 5 |
3 4 3 | 3 3 |
The documentation:
prxchange('s/world/planet/', 1, 'Hello world!');
where
s |
specifies the metacharacter for substitution. |
world |
specifies the regular expression. |
planet |
specifies the replacement value for world. |
1 |
specifies that the search ends when one match is found. |
Hello world! |
specifies the source string to be searched. |
In this case: s, 4 and a space, nothing (//), 1, and the old variable.
What's the rules?
How do you know what to delete and what to keep?
@Jaime2 wrote:
I have a dataset with a character variable called "oldvar" that contains a string of numeric values such as "4 6 6 4 5". I would like to create a new variable that has one of the "4" values removed. For example, I would like my new variable to be "6 6 4 5" with the first 4 deleted or alternatively the new variable could be "4 6 6 5". Either one would work for me. I've tried using a combination of the substr and index functions but none have produced what I'm looking for.
Example of values that "oldvar" takes on:
4 5 4 5
4 6 4
6 6 4 5
3 4 3
Thank you in advance for any help!
I'm creating control cards for another program. The rule is to literally "delete one "4" from the string of values." There is always at least one "4" in the variable, sometimes there are 2 or 3 of them. Thanks for asking.
So what is the final desired product from your data set?
Is this what you want?
have | want |
4 5 4 5 | 5 4 5 |
4 6 4 | 6 4 |
6 6 4 5 | 6 6 5 |
3 4 3 | 3 3 |
Yes, this is exactly what I want to achieve.
One approach:
data want;
set have;
found4 = indexw(oldvar, '4');
select (found4);
when (0);
when (1) oldvar = substr(oldvar, 3);
otherwise oldvar = catx(' ', substr(oldvar, 1, found4 - 1), substr(oldvar, found4 + 1)) ;
end;
drop found4;
run;
I'm sure there are ways to do this with parsing functions, but I'm not as familiar with them.
************ EDITED:
See the @HB solution below. There's a good reason to learn parsing functions.
Here's another approach just for options.
data have;
input oldvar $20.;
datalines;
4 5 4 5
4 6 4
6 6 4 5
3 4 3
;
run;
data want;
set have;
newvar= prxchange('s/4 //', 1, oldvar);
run;
This yields
The SAS System | |
oldvar | newvar |
4 5 4 5 | 5 4 5 |
4 6 4 | 6 4 |
6 6 4 5 | 6 6 5 |
3 4 3 | 3 3 |
The documentation:
prxchange('s/world/planet/', 1, 'Hello world!');
where
s |
specifies the metacharacter for substitution. |
world |
specifies the regular expression. |
planet |
specifies the replacement value for world. |
1 |
specifies that the search ends when one match is found. |
Hello world! |
specifies the source string to be searched. |
In this case: s, 4 and a space, nothing (//), 1, and the old variable.
Your solution worked - thank you!
Thank you very much! Your solution also worked just fine!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.