This is a subset of a larger data. The data was generated by a biochemist and I have an issue with the subjects' IDs.
"METAPH" is the name of the study. Subjects are measured at 3 occasions (S1-1, S3-1 and S5-1). The subject's ID is as follow: occasion-study'name-subject. For example the first subject has 3 IDs corresponding to the 3 measurements:
S1-1-METAPH-002
S3-1-METAPH-002
S5-1-METAPH-002
Data looks like it was manually entered:
SUBJECT
S1-1-METAPH-002
S1-1-METAPH-019
S1-1-METAPH-089
S1-1-METAPH-027
S1-1-METAPH-006
S1-1-METAPH-007
S1-1-METAPH-008
S1-1-METAPH-003
S3-1-METAPH-027
S3-1-METAPH-008
S3-1-METAPH-089
S3-1-METAPH-003
S3-1-METAPH-002
S3-1-METAPH-006
S3-1-METAPH-007
S3-1-METAPH-019
S5-1-METAPH-002
S5-1-METAPH-003
S5-1-METAPH-027
S5-1-METAPH-089
S5-1-METAPH-006
S5-1-METAPH-019
S5-1-METAPH-008
S5-1-METAPH-007
I would like to replace these 3 IDs by only one, which is 1, and the data to look like:
1
1
1
2
2
2
...
Thank you
If I understand you correctly, you want to get the last 3 digits from the subject variable, which identifies the patient. One way is this:
data want;
set have;
length id $3;
id=scan(subject,-1,'-');
run;
I don't understand this. Why 1, 1, 1 and 2, 2, 2 ?
As you can see the IDs are not sequential perhaps some subjects were exluded from the study. I would like to identify subjects by their numbers in the study and not by this long and confusing ID. I have another column that specifies the time of measurement (S1,S3 and S5) which I can use in my modeling. For example these 3 measurements of patient 002 I want to have 3 records called 002. In other word, I want to leave only the three digits to the far right of this long ID. I hope I answered your question.
Is that the only pattern that we need to be aware of? In other words, is it S[digit]-[number]? If so, this is a bit clunky but works for this instance.
data have;
input subject $20.;
datalines;
S1-1-METAPH-002
S1-1-METAPH-019
S1-1-METAPH-089
S1-1-METAPH-027
S1-1-METAPH-006
S1-1-METAPH-007
S1-1-METAPH-008
S1-1-METAPH-003
S3-1-METAPH-027
S3-1-METAPH-008
S3-1-METAPH-089
S3-1-METAPH-003
S3-1-METAPH-002
S3-1-METAPH-006
S3-1-METAPH-007
S3-1-METAPH-019
S5-1-METAPH-002
S5-1-METAPH-003
S5-1-METAPH-027
S5-1-METAPH-089
S5-1-METAPH-006
S5-1-METAPH-019
S5-1-METAPH-008
S5-1-METAPH-007
;
run;
proc sort
data = have;
by subject;
run;
data have_2;
set have;
id = catx("-", scan(subject, 1, '-'), scan(subject, 2, '-'));
run;
proc sort
data = have_2;
by id;
run;
data want;
set have_2;
by id;
retain new_id;
if first.id then new_id + 1;
run;
Obs subject id new_id 1 S1-1-METAPH-002 S1-1 1 2 S1-1-METAPH-003 S1-1 1 3 S1-1-METAPH-006 S1-1 1 4 S1-1-METAPH-007 S1-1 1 5 S1-1-METAPH-008 S1-1 1 6 S1-1-METAPH-019 S1-1 1 7 S1-1-METAPH-027 S1-1 1 8 S1-1-METAPH-089 S1-1 1 9 S3-1-METAPH-002 S3-1 2 10 S3-1-METAPH-003 S3-1 2 11 S3-1-METAPH-006 S3-1 2 12 S3-1-METAPH-007 S3-1 2 13 S3-1-METAPH-008 S3-1 2 14 S3-1-METAPH-019 S3-1 2 15 S3-1-METAPH-027 S3-1 2 16 S3-1-METAPH-089 S3-1 2 17 S5-1-METAPH-002 S5-1 3 18 S5-1-METAPH-003 S5-1 3 19 S5-1-METAPH-006 S5-1 3 20 S5-1-METAPH-007 S5-1 3 21 S5-1-METAPH-008 S5-1 3 22 S5-1-METAPH-019 S5-1 3 23 S5-1-METAPH-027 S5-1 3 24 S5-1-METAPH-089 S5-1 3
I didn't well explain what I needed. As you put it: S[digit]-[number]:
I needed only to keep the "number". I have another column in the dataset to control when measurement was taken. This is longitudinal study so I should have 3 identical ID number per subjet.
Alright, I misunderstood your question. Do you mind replacing the `.` in the id field in this DATALINES statement with exactly what you want?
data want;
input subject :$20. id;
datalines;
S1-1-METAPH-002 .
S1-1-METAPH-019 .
S1-1-METAPH-089 .
S1-1-METAPH-027 .
S1-1-METAPH-006 .
S1-1-METAPH-007 .
S1-1-METAPH-008 .
S1-1-METAPH-003 .
S3-1-METAPH-027 .
S3-1-METAPH-008 .
S3-1-METAPH-089 .
S3-1-METAPH-003 .
S3-1-METAPH-002 .
S3-1-METAPH-006 .
S3-1-METAPH-007 .
S3-1-METAPH-019 .
S5-1-METAPH-002 .
S5-1-METAPH-003 .
S5-1-METAPH-027 .
S5-1-METAPH-089 .
S5-1-METAPH-006 .
S5-1-METAPH-019 .
S5-1-METAPH-008 .
S5-1-METAPH-007 .
;
run;
You can copy this code, and in your reply use the `insert SAS code` button and replace the `.` with your desired values.
This will help us figure out the solution.
Sorry for not explaining straight to the point what I needed. It was so simple to say I want to keep the last 3 digits.
If I understand you correctly, you want to get the last 3 digits from the subject variable, which identifies the patient. One way is this:
data want;
set have;
length id $3;
id=scan(subject,-1,'-');
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.