BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
zimcom
Pyrite | Level 9

 

 

Hi all,

 

I have a variable called subject_ID composed of Site ID (first one or two number are Site and the last 4 digits are subject ID) and  saved as text and I want to add a "-" in between, like 12-0006, 9-0011 or 22-0001 for example, any idea is greatly appreciated!

 

Subject_ID

120006
120004
120003
120002
120005
90011
90010
90006
90001
90009
220001
220019
220017
1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

If the variable is character what is the current defined length of the variable? What is the longest current value in the variable?

If the two numbers above match then you will need to do something to increase the length of the variable because inserting would make the variable too long to fit in the current definition.

 

Something like:

data want;
   length subject_id $ 7;
   set have;
   if length(subject_id)=6 then subject_id = catx('-',substr(subject_id,1,2),substr(subject_id,3) ;
else if length(subject_id)=5 then subject_id = catx('-',substr(subject_id,1,1),substr(subject_id,2) ; run;

I use 7 above as the longest value you showed was 6 characters. If the actual defined length is longer than the longest value the length statement would not be needed.

Edited to address the 5 character length option I missed.

 

If your variable is NUMERIC then you will need to create an entirely new variable as you cannot have - as you would intend. I suspect I might be able to create a format that would display the current numeric as desired but that sounds fragile.

 

Who combined these values? For a large number of purposes it may be easier to use Site and the 4 digits (or other identifier) as separate variables (of course if they are already in the data let us know).

View solution in original post

12 REPLIES 12
PaigeMiller
Diamond | Level 26

Is Subject_ID a numeric or character variable?

--
Paige Miller
zimcom
Pyrite | Level 9

@PaigeMiller character

PaigeMiller
Diamond | Level 26

So is there a rule that would allow us to separate the cases where you have a single digit before the dash, and the cases where you have two digits before the dash? 

--
Paige Miller
zimcom
Pyrite | Level 9

@PaigeMiller the last 4 digits are the patient ID, it always has 4 didgits, the first 1 or 2 digits are the site iD, when the sites run over 10, it becomes 2 digits. So basically it is to insert a "-" in the 5th place from the right, hope this is clear

PaigeMiller
Diamond | Level 26

@zimcom wrote:

@PaigeMiller the last 4 digits are the patient ID, it always has 4 didgits, the first 1 or 2 digits are the site iD, when the sites run over 10, it becomes 2 digits. So basically it is to insert a "-" in the 5th place from the right, hope this is clear


data want;
    set have;
    length new_subject_id $ 8;
    len=length(subject_id);
    new_subject_id=catx('-',substr(subject_id,1,len-4),substr(subject_id,len-3));
run;
--
Paige Miller
Patrick
Opal | Level 21

@zimcom 

Or code as below. Eventually add some check on the length of the source string to deal with incomplete source data.

data have(drop=_:);
  infile datalines truncover;
  input Subject_ID $;
  length want $ 100;
  _len=lengthn(Subject_ID);
  want=substrn(Subject_ID,1,_len-4)||'-'||substrn(Subject_ID,_len-3);
  cards;
87654321
7654321
654321
54321
4321
321
21
1
 
;

Capture.JPG

novinosrin
Tourmaline | Level 20
data have;
input Subject_ID $;
cards;
120006
120004
120003
120002
120005
90011
90010
90006
90001
90009
220001
220019
220017
;

data want;
set have;
temp=indexc(Subject_ID,'0');
want=catx('-',substr(Subject_ID,1,temp-1),substr(subject_id,temp));
drop temp;
run;

I made some silly assumptions though 🙂 

PaigeMiller
Diamond | Level 26

@novinosrin That works if the 0 is the delimiter where the dash goes. Otherwise not.

--
Paige Miller
novinosrin
Tourmaline | Level 20

@PaigeMiller Yes you are right.

@zimcom  The following works with no assumptions

 

data have;
input Subject_ID $;
cards;
120006
120004
120003
120002
120005
90011
90010
90006
90001
90009
220001
220019
220017
;

data want;
set have;
temp=substr(Subject_ID,length(Subject_ID)-4+1,4);
temp1=substr(Subject_ID,1,length(Subject_ID)-4);
want=catx('-',temp1,temp);
drop temp:;
run;
ballardw
Super User

If the variable is character what is the current defined length of the variable? What is the longest current value in the variable?

If the two numbers above match then you will need to do something to increase the length of the variable because inserting would make the variable too long to fit in the current definition.

 

Something like:

data want;
   length subject_id $ 7;
   set have;
   if length(subject_id)=6 then subject_id = catx('-',substr(subject_id,1,2),substr(subject_id,3) ;
else if length(subject_id)=5 then subject_id = catx('-',substr(subject_id,1,1),substr(subject_id,2) ; run;

I use 7 above as the longest value you showed was 6 characters. If the actual defined length is longer than the longest value the length statement would not be needed.

Edited to address the 5 character length option I missed.

 

If your variable is NUMERIC then you will need to create an entirely new variable as you cannot have - as you would intend. I suspect I might be able to create a format that would display the current numeric as desired but that sounds fragile.

 

Who combined these values? For a large number of purposes it may be easier to use Site and the 4 digits (or other identifier) as separate variables (of course if they are already in the data let us know).

PaigeMiller
Diamond | Level 26

The solution from @ballardw fails for the case where the Subject_ID is 90011

--
Paige Miller
Ksharp
Super User
data have;
input Subject_ID $;
length want $ 100;
want=prxchange('s/(\d{4})$/-$1/',1,strip(Subject_ID));
cards;
120006
120004
120003
120002
120005
90011
90010
90006
90001
90009
220001
220019
220017
;

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 12 replies
  • 3116 views
  • 2 likes
  • 6 in conversation