BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
lh50
Fluorite | Level 6

I have a dataset where the participant ID is 11 characters long and starts with the letters ABCDEFG followed by 4 numbers. Current variable format is $CHAR11. The IDs count up by one beginning with 1, so ABCDEFG0001, ABCDEFG002....ABCDEFG0875, etc. However, some of the IDs were entered in error without the letters and leading zeros (see below). How do I add the letters and correct number of leading zeros to the rows where they are missing while leaving the properly formatted IDs as-is?

 

ID
2
6
8
12
13
14
20
21
358
439
589
ABCDEFG0001
ABCDEFG0003
1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

@lh50 You could exclude any string longer than n-characters.

data want;
  set have;
  id=strip(id);
  if length(id)<=11-length('ABCDEFG') then 
    do;
      if put(input(id,?? 11.),11.-l)=strip(id) then 
        do;
          id=cats('ABCDEFG',put(input(id,11.),z4.));
        end;
    end;
run;

View solution in original post

5 REPLIES 5
Patrick
Opal | Level 21

Below should work.

data have;
  infile datalines truncover;
  input ID :$char11.;
  datalines;
2
  6
8
 12
13
14
20
21
358
439
589
ABCDEFG0001
ABCDEFG0003
;

data want;
  set have;
  if put(input(id,?? 11.),11.-l)=strip(id) then 
    do;
      id=cats('ABCDEFG',put(input(id,11.),z4.));
    end;
run;

Patrick_0-1647292592308.png

 

lh50
Fluorite | Level 6
This worked great, thanks. One caveat I forgot to mention: there are also some IDs that are a string of 7 or 8 numbers that don't follow the numerical pattern (1549817 for example). Is there any way to correct the IDs as you did in your solution while leaving these unchanged? When I apply your solution, it makes these IDs look like ABCDEFG17E7 (using the above example). I would rather leave those IDs unchanged if possible.
Patrick
Opal | Level 21

@lh50 You could exclude any string longer than n-characters.

data want;
  set have;
  id=strip(id);
  if length(id)<=11-length('ABCDEFG') then 
    do;
      if put(input(id,?? 11.),11.-l)=strip(id) then 
        do;
          id=cats('ABCDEFG',put(input(id,11.),z4.));
        end;
    end;
run;
ballardw
Super User

One way that seems to work with the example:

data want;
   set have;
   select (length (id));
      when (1,2,3,4) Id=cats('ABCDEFG',put(input(id,4.),z4.));
      when (11); /*nothing to do*/
      otherwise put "WARNING: Rule for handling this case not provided " id=;
   end;
run;

The Othewise would execute when you have an unexpected number of characters such as 5. Just in case, this puts a warning in the log.

The Select statement evaluates the code in the parentheses and finds the value returned, in this case by the Length function and will execute the statement following the When with the result. 1 through 4 are expected to be up to a 4 digit number. 11 expects to be a complete Id with no need to fix. Any other result is likely a bit more complicated to fix, if they occur. Such as a number like 12345. No zeroes, too many digits for your rule so likely needs manual intervention.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1267 views
  • 3 likes
  • 3 in conversation