BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
archibald
Obsidian | Level 7

Hi all, 

I have a character variable ID is as follows :

ID

1002-004567-UC

1002-000062-UC

1002-239874-UC

 I would like to transform the  ID variable such as: 

ID

4567

62

239874

What would be the appropriate syntax to use?

Thank you.

 

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

You could also use pattern matching

 

data ids;
input idStr :$16.;
datalines;
1002-004567-UC
1002-000062-UC
1002-239874-UC
;

data want;
set ids;
id = prxchange("s/.*-0*([1-9]\d*)-.*/\1/o", 1, idStr);
run;

proc print data=want noobs; run;

The pattern reads: Find any string, followed by a dash, followed by any number of zeros, ( followed by a digit between 1 and 9, followed by any number of digits ), followed by a dash, followed by anything. Keep the string part matched within the parentheses.

PG

View solution in original post

9 REPLIES 9
stat_sas
Ammonite | Level 13

data have;
input ID $20.;
datalines;
1002-004567-UC
1002-000062-UC
1002-239874-UC
;

data want(drop=id rename=id1=id);
set have;
ID1=input(scan(ID,2),8.);
run;

archibald
Obsidian | Level 7
sat_sas,
the code provided does not work for me. I got the following error message :Invalid argument to function INPUT
Reeza
Super User

Do you want the variable to be character or numeric?

 

The solution above generates a numeric ID.  In general ID's should be character to avoid accidental mathematical issues and precision issues in merging. 

SCAN() isolates the middle term

INPUT() converts to a number, so it removes the leading zero's

PUT() converts it back to a character, -l, left aligns the variable.

 

ID_VAR = put(input(scan(var, 2, "-"), 8.), 8. -l);
archibald
Obsidian | Level 7
Thanks Reeza for your reply.
I am not sure this is on my end, but I was not successful using this code. I got an error message Invalid argument to function INPUT
perhaps I am missing something...
PGStats
Opal | Level 21

You could also use pattern matching

 

data ids;
input idStr :$16.;
datalines;
1002-004567-UC
1002-000062-UC
1002-239874-UC
;

data want;
set ids;
id = prxchange("s/.*-0*([1-9]\d*)-.*/\1/o", 1, idStr);
run;

proc print data=want noobs; run;

The pattern reads: Find any string, followed by a dash, followed by any number of zeros, ( followed by a digit between 1 and 9, followed by any number of digits ), followed by a dash, followed by anything. Keep the string part matched within the parentheses.

PG
archibald
Obsidian | Level 7
PGStats,
This works great! Thanks!!
Also say the ID variable was as follows:
1002-004567UC
1002-000062UC
1002-239874UC
which part of the code should I modify to get the same result
ID
4567
62
239874
PGStats
Opal | Level 21

Replace the pattern with "s/.*-0*([1-9]\d*)\D.*/\1/o"

 

It would actually work also with your original question. It says that the number field ends when a non number character is found.

PG
PGStats
Opal | Level 21

I guess you could just as well use "s/.*-0*([1-9]\d*).*/\1/o" which would work even if the number goes to the end of the string. Smiley Happy

PG
archibald
Obsidian | Level 7
great! thanks again!

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 2527 views
  • 1 like
  • 4 in conversation