I have a dataset where the participant ID is 11 characters long and starts with the letters ABCDEFG followed by 4 numbers. Current variable format is $CHAR11. The IDs count up by one beginning with 1, so ABCDEFG0001, ABCDEFG002....ABCDEFG0875, etc. However, some of the IDs were entered in error without the letters and leading zeros (see below). How do I add the letters and correct number of leading zeros to the rows where they are missing while leaving the properly formatted IDs as-is?
ID |
2 |
6 |
8 |
12 |
13 |
14 |
20 |
21 |
358 |
439 |
589 |
ABCDEFG0001 |
ABCDEFG0003 |
@lh50 You could exclude any string longer than n-characters.
data want;
set have;
id=strip(id);
if length(id)<=11-length('ABCDEFG') then
do;
if put(input(id,?? 11.),11.-l)=strip(id) then
do;
id=cats('ABCDEFG',put(input(id,11.),z4.));
end;
end;
run;
Below should work.
data have;
infile datalines truncover;
input ID :$char11.;
datalines;
2
6
8
12
13
14
20
21
358
439
589
ABCDEFG0001
ABCDEFG0003
;
data want;
set have;
if put(input(id,?? 11.),11.-l)=strip(id) then
do;
id=cats('ABCDEFG',put(input(id,11.),z4.));
end;
run;
@lh50 You could exclude any string longer than n-characters.
data want;
set have;
id=strip(id);
if length(id)<=11-length('ABCDEFG') then
do;
if put(input(id,?? 11.),11.-l)=strip(id) then
do;
id=cats('ABCDEFG',put(input(id,11.),z4.));
end;
end;
run;
One way that seems to work with the example:
data want; set have; select (length (id)); when (1,2,3,4) Id=cats('ABCDEFG',put(input(id,4.),z4.)); when (11); /*nothing to do*/ otherwise put "WARNING: Rule for handling this case not provided " id=; end; run;
The Othewise would execute when you have an unexpected number of characters such as 5. Just in case, this puts a warning in the log.
The Select statement evaluates the code in the parentheses and finds the value returned, in this case by the Length function and will execute the statement following the When with the result. 1 through 4 are expected to be up to a 4 digit number. 11 expects to be a complete Id with no need to fix. Any other result is likely a bit more complicated to fix, if they occur. Such as a number like 12345. No zeroes, too many digits for your rule so likely needs manual intervention.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.