BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Job04
Quartz | Level 8

This is a subset of a larger data. The data was generated by a biochemist and I have an issue with the subjects' IDs. 

 

"METAPH" is the name of the study. Subjects are  measured at 3 occasions (S1-1, S3-1 and S5-1). The subject's ID is as follow: occasion-study'name-subject. For example the first subject has 3 IDs corresponding to the 3 measurements:

S1-1-METAPH-002

S3-1-METAPH-002

S5-1-METAPH-002

 

Data looks like it was manually entered: 

 

SUBJECT
S1-1-METAPH-002
S1-1-METAPH-019
S1-1-METAPH-089
S1-1-METAPH-027
S1-1-METAPH-006
S1-1-METAPH-007
S1-1-METAPH-008
S1-1-METAPH-003
S3-1-METAPH-027
S3-1-METAPH-008
S3-1-METAPH-089
S3-1-METAPH-003
S3-1-METAPH-002
S3-1-METAPH-006
S3-1-METAPH-007
S3-1-METAPH-019
S5-1-METAPH-002
S5-1-METAPH-003
S5-1-METAPH-027
S5-1-METAPH-089
S5-1-METAPH-006
S5-1-METAPH-019
S5-1-METAPH-008
S5-1-METAPH-007

 

I would like to replace  these 3 IDs by only one, which is 1, and the data to look like:  

 

1

1

1

2

2

2

...

 

Thank you

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
s_lassen
Meteorite | Level 14

If I understand you correctly, you want to get the last 3 digits from the subject variable, which identifies the patient. One way is this:

data want;
  set have;
  length id $3;
  id=scan(subject,-1,'-');
run;

View solution in original post

7 REPLIES 7
PeterClemmensen
Tourmaline | Level 20

I don't understand this. Why 1, 1, 1 and 2, 2, 2 ?

Job04
Quartz | Level 8

As you can see the IDs are not sequential perhaps some subjects were exluded from the  study. I would like to  identify subjects by their numbers in the study and not by this long and confusing ID. I have another column that specifies the time of measurement (S1,S3  and S5) which I can use in my modeling. For example these 3 measurements of patient 002 I want to have 3  records called 002. In other word, I want to leave only the three digits to the far right of this long ID. I hope I answered your question.

 

maguiremq
SAS Super FREQ

Is that the only pattern that we need to be aware of? In other words, is it S[digit]-[number]? If so, this is a bit clunky but works for this instance. 

data have;
input subject $20.;
datalines;
S1-1-METAPH-002
S1-1-METAPH-019
S1-1-METAPH-089
S1-1-METAPH-027
S1-1-METAPH-006
S1-1-METAPH-007
S1-1-METAPH-008
S1-1-METAPH-003
S3-1-METAPH-027
S3-1-METAPH-008
S3-1-METAPH-089
S3-1-METAPH-003
S3-1-METAPH-002
S3-1-METAPH-006
S3-1-METAPH-007
S3-1-METAPH-019
S5-1-METAPH-002
S5-1-METAPH-003
S5-1-METAPH-027
S5-1-METAPH-089
S5-1-METAPH-006
S5-1-METAPH-019
S5-1-METAPH-008
S5-1-METAPH-007
;
run;

proc sort
	data = have;
		by subject;
run;

data have_2;
	set have;
		id = catx("-", scan(subject, 1, '-'), scan(subject, 2, '-'));
run;

proc sort
	data = have_2;
		by id;
run;

data want;
	set have_2;
	by id;
	retain new_id;
		if first.id then new_id + 1;
run;
Obs subject id new_id 
1 S1-1-METAPH-002 S1-1 1 
2 S1-1-METAPH-003 S1-1 1 
3 S1-1-METAPH-006 S1-1 1 
4 S1-1-METAPH-007 S1-1 1 
5 S1-1-METAPH-008 S1-1 1 
6 S1-1-METAPH-019 S1-1 1 
7 S1-1-METAPH-027 S1-1 1 
8 S1-1-METAPH-089 S1-1 1 
9 S3-1-METAPH-002 S3-1 2 
10 S3-1-METAPH-003 S3-1 2 
11 S3-1-METAPH-006 S3-1 2 
12 S3-1-METAPH-007 S3-1 2 
13 S3-1-METAPH-008 S3-1 2 
14 S3-1-METAPH-019 S3-1 2 
15 S3-1-METAPH-027 S3-1 2 
16 S3-1-METAPH-089 S3-1 2 
17 S5-1-METAPH-002 S5-1 3 
18 S5-1-METAPH-003 S5-1 3 
19 S5-1-METAPH-006 S5-1 3 
20 S5-1-METAPH-007 S5-1 3 
21 S5-1-METAPH-008 S5-1 3 
22 S5-1-METAPH-019 S5-1 3 
23 S5-1-METAPH-027 S5-1 3 
24 S5-1-METAPH-089 S5-1 3 
Job04
Quartz | Level 8

I didn't well explain what I needed. As you put it: S[digit]-[number]: 

I needed only to keep the "number".  I have another column in the dataset to control when measurement was taken. This is longitudinal study  so I should have 3 identical ID number per subjet. 

 

 

maguiremq
SAS Super FREQ

Alright, I misunderstood your question. Do you mind replacing the `.` in the id field in this DATALINES statement with exactly what you want?

data want;
input subject :$20. id;
datalines;
S1-1-METAPH-002 .
S1-1-METAPH-019 .
S1-1-METAPH-089 .
S1-1-METAPH-027 .
S1-1-METAPH-006 .
S1-1-METAPH-007 .
S1-1-METAPH-008 .
S1-1-METAPH-003 .
S3-1-METAPH-027 .
S3-1-METAPH-008 .
S3-1-METAPH-089 .
S3-1-METAPH-003 .
S3-1-METAPH-002 .
S3-1-METAPH-006 .
S3-1-METAPH-007 .
S3-1-METAPH-019 .
S5-1-METAPH-002 .
S5-1-METAPH-003 .
S5-1-METAPH-027 .
S5-1-METAPH-089 .
S5-1-METAPH-006 .
S5-1-METAPH-019 .
S5-1-METAPH-008 .
S5-1-METAPH-007 .
;
run;

You can copy this code, and in your reply use the `insert SAS code` button and replace the `.` with your desired values.

 

This will help us figure out the solution.

Job04
Quartz | Level 8

Sorry for not explaining straight to the point what I needed.  It was so simple to say I want to keep the last 3 digits. 

s_lassen
Meteorite | Level 14

If I understand you correctly, you want to get the last 3 digits from the subject variable, which identifies the patient. One way is this:

data want;
  set have;
  length id $3;
  id=scan(subject,-1,'-');
run;

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1264 views
  • 1 like
  • 4 in conversation