BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
archibald
Obsidian | Level 7

Hi all, 

I have a character variable ID is as follows :

ID

1002-004567-UC

1002-000062-UC

1002-239874-UC

 I would like to transform the  ID variable such as: 

ID

4567

62

239874

What would be the appropriate syntax to use?

Thank you.

 

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

You could also use pattern matching

 

data ids;
input idStr :$16.;
datalines;
1002-004567-UC
1002-000062-UC
1002-239874-UC
;

data want;
set ids;
id = prxchange("s/.*-0*([1-9]\d*)-.*/\1/o", 1, idStr);
run;

proc print data=want noobs; run;

The pattern reads: Find any string, followed by a dash, followed by any number of zeros, ( followed by a digit between 1 and 9, followed by any number of digits ), followed by a dash, followed by anything. Keep the string part matched within the parentheses.

PG

View solution in original post

9 REPLIES 9
stat_sas
Ammonite | Level 13

data have;
input ID $20.;
datalines;
1002-004567-UC
1002-000062-UC
1002-239874-UC
;

data want(drop=id rename=id1=id);
set have;
ID1=input(scan(ID,2),8.);
run;

archibald
Obsidian | Level 7
sat_sas,
the code provided does not work for me. I got the following error message :Invalid argument to function INPUT
Reeza
Super User

Do you want the variable to be character or numeric?

 

The solution above generates a numeric ID.  In general ID's should be character to avoid accidental mathematical issues and precision issues in merging. 

SCAN() isolates the middle term

INPUT() converts to a number, so it removes the leading zero's

PUT() converts it back to a character, -l, left aligns the variable.

 

ID_VAR = put(input(scan(var, 2, "-"), 8.), 8. -l);
archibald
Obsidian | Level 7
Thanks Reeza for your reply.
I am not sure this is on my end, but I was not successful using this code. I got an error message Invalid argument to function INPUT
perhaps I am missing something...
PGStats
Opal | Level 21

You could also use pattern matching

 

data ids;
input idStr :$16.;
datalines;
1002-004567-UC
1002-000062-UC
1002-239874-UC
;

data want;
set ids;
id = prxchange("s/.*-0*([1-9]\d*)-.*/\1/o", 1, idStr);
run;

proc print data=want noobs; run;

The pattern reads: Find any string, followed by a dash, followed by any number of zeros, ( followed by a digit between 1 and 9, followed by any number of digits ), followed by a dash, followed by anything. Keep the string part matched within the parentheses.

PG
archibald
Obsidian | Level 7
PGStats,
This works great! Thanks!!
Also say the ID variable was as follows:
1002-004567UC
1002-000062UC
1002-239874UC
which part of the code should I modify to get the same result
ID
4567
62
239874
PGStats
Opal | Level 21

Replace the pattern with "s/.*-0*([1-9]\d*)\D.*/\1/o"

 

It would actually work also with your original question. It says that the number field ends when a non number character is found.

PG
PGStats
Opal | Level 21

I guess you could just as well use "s/.*-0*([1-9]\d*).*/\1/o" which would work even if the number goes to the end of the string. Smiley Happy

PG
archibald
Obsidian | Level 7
great! thanks again!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1880 views
  • 1 like
  • 4 in conversation