I have a very large dataset which contains a string a number formatted as characters and I would like to
add additional characters to the beginning of each number. Is there a way to accomplish this without
having to write conditional statements?
Below is an example of the dataset:
string | grp |
1 | 150,160,252,582,360 |
2 | 225,587,331,452,589 |
And this is what I am trying to achieve:
string | grp |
1 | AX150, AX160, AX252, AX582, AX360 |
2 | YL225, YL587, YL331, YL452, YL589 |
You still fail to answer the question "How do you know per string (or substring) which characters you need to add?".
Below some code that assumes you've got a lookup table which defines which characters to add to which string of digits. If that's not how thing need to work then please explain/show in detail how this needs to work.
data work.have;
infile datalines truncover;
input grp $50.;
datalines;
150,160,252,582,360
225,587,331,452,589
150,999,225
;
data work.cntlin;
infile datalines truncover;
retain fmtname '$suffix' type 'C';
input start $ label $;
datalines;
150 AX
160 AX
252 AX
582 AX
360 AX
225 YL
587 YL
331 YL
452 YL
589 YL
other
;
proc format cntlin=work.cntlin;
run;
data want(drop=_:);
set have;
length new_grp $80 _wrd $3;
do _i=1 to countw(grp,',');
_wrd=scan(grp,_i,',');
if _i=1 then new_grp=cats(put(_wrd,$suffix.),_wrd);
else new_grp=cats(new_grp,',',put(_wrd,$suffix.),_wrd);
end;
run;
proc print data=want;
run;
And when would you add AX and when YL? Or is there only a single value per observation? Like variable group for a single observation would for example have value 150 and in this case you would want the value to become AX150?
Is the variable that holds the source string already long enough to store two additional characters?
Some sample representative sample data with desired result would be nice. The sample data provided via working SAS datastep code with datalines that creates a SAS table.
What is the defined length of the variable Grp? What is the length of the longest actual value?
Since you are inserting, at least in your examples 10 characters, it is quite likely that you do not have enough length for all those additional characters to fit in the existing variable name grp. Which means creating a new variable, long enough to hole the longest expected resulting length before adding those characters.
Note: what advantage do you get by having multiple values in a single variable? Almost any operation, including this adding characters, is much harder to do when there are multiple values in one variable. If you did something to combine values to get this current result perhaps add the characters at that time.
The dataset is received in the manner and adding characters to the string is an additional request. The existing grp variable is not
long enough therefore a new variable would need to be created. I would be interested in seeing if I could do this without having to list
every combination of the grp variable in a conditional statement.
@luvscandy27 wrote:
The dataset is received in the manner and adding characters to the string is an additional request. The existing grp variable is not
long enough therefore a new variable would need to be created. I would be interested in seeing if I could do this without having to list
every combination of the grp variable in a conditional statement.
So your variable actually contains strings like "150,160,252,582,360"
How do you know per string (or substring) which characters you need to add?
Yes, the dataset "150,160,252,582,360" actually contains strings. But o course this variable is not used
in any type of analysis its an internal variable.
You still fail to answer the question "How do you know per string (or substring) which characters you need to add?".
Below some code that assumes you've got a lookup table which defines which characters to add to which string of digits. If that's not how thing need to work then please explain/show in detail how this needs to work.
data work.have;
infile datalines truncover;
input grp $50.;
datalines;
150,160,252,582,360
225,587,331,452,589
150,999,225
;
data work.cntlin;
infile datalines truncover;
retain fmtname '$suffix' type 'C';
input start $ label $;
datalines;
150 AX
160 AX
252 AX
582 AX
360 AX
225 YL
587 YL
331 YL
452 YL
589 YL
other
;
proc format cntlin=work.cntlin;
run;
data want(drop=_:);
set have;
length new_grp $80 _wrd $3;
do _i=1 to countw(grp,',');
_wrd=scan(grp,_i,',');
if _i=1 then new_grp=cats(put(_wrd,$suffix.),_wrd);
else new_grp=cats(new_grp,',',put(_wrd,$suffix.),_wrd);
end;
run;
proc print data=want;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.