BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Job04
Quartz | Level 8

This is a subset of a larger data. The data was generated by a biochemist and I have an issue with the subjects' IDs. 

 

"METAPH" is the name of the study. Subjects are  measured at 3 occasions (S1-1, S3-1 and S5-1). The subject's ID is as follow: occasion-study'name-subject. For example the first subject has 3 IDs corresponding to the 3 measurements:

S1-1-METAPH-002

S3-1-METAPH-002

S5-1-METAPH-002

 

Data looks like it was manually entered: 

 

SUBJECT
S1-1-METAPH-002
S1-1-METAPH-019
S1-1-METAPH-089
S1-1-METAPH-027
S1-1-METAPH-006
S1-1-METAPH-007
S1-1-METAPH-008
S1-1-METAPH-003
S3-1-METAPH-027
S3-1-METAPH-008
S3-1-METAPH-089
S3-1-METAPH-003
S3-1-METAPH-002
S3-1-METAPH-006
S3-1-METAPH-007
S3-1-METAPH-019
S5-1-METAPH-002
S5-1-METAPH-003
S5-1-METAPH-027
S5-1-METAPH-089
S5-1-METAPH-006
S5-1-METAPH-019
S5-1-METAPH-008
S5-1-METAPH-007

 

I would like to replace  these 3 IDs by only one, which is 1, and the data to look like:  

 

1

1

1

2

2

2

...

 

Thank you

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
s_lassen
Meteorite | Level 14

If I understand you correctly, you want to get the last 3 digits from the subject variable, which identifies the patient. One way is this:

data want;
  set have;
  length id $3;
  id=scan(subject,-1,'-');
run;

View solution in original post

7 REPLIES 7
PeterClemmensen
Tourmaline | Level 20

I don't understand this. Why 1, 1, 1 and 2, 2, 2 ?

Job04
Quartz | Level 8

As you can see the IDs are not sequential perhaps some subjects were exluded from the  study. I would like to  identify subjects by their numbers in the study and not by this long and confusing ID. I have another column that specifies the time of measurement (S1,S3  and S5) which I can use in my modeling. For example these 3 measurements of patient 002 I want to have 3  records called 002. In other word, I want to leave only the three digits to the far right of this long ID. I hope I answered your question.

 

maguiremq
SAS Super FREQ

Is that the only pattern that we need to be aware of? In other words, is it S[digit]-[number]? If so, this is a bit clunky but works for this instance. 

data have;
input subject $20.;
datalines;
S1-1-METAPH-002
S1-1-METAPH-019
S1-1-METAPH-089
S1-1-METAPH-027
S1-1-METAPH-006
S1-1-METAPH-007
S1-1-METAPH-008
S1-1-METAPH-003
S3-1-METAPH-027
S3-1-METAPH-008
S3-1-METAPH-089
S3-1-METAPH-003
S3-1-METAPH-002
S3-1-METAPH-006
S3-1-METAPH-007
S3-1-METAPH-019
S5-1-METAPH-002
S5-1-METAPH-003
S5-1-METAPH-027
S5-1-METAPH-089
S5-1-METAPH-006
S5-1-METAPH-019
S5-1-METAPH-008
S5-1-METAPH-007
;
run;

proc sort
	data = have;
		by subject;
run;

data have_2;
	set have;
		id = catx("-", scan(subject, 1, '-'), scan(subject, 2, '-'));
run;

proc sort
	data = have_2;
		by id;
run;

data want;
	set have_2;
	by id;
	retain new_id;
		if first.id then new_id + 1;
run;
Obs subject id new_id 
1 S1-1-METAPH-002 S1-1 1 
2 S1-1-METAPH-003 S1-1 1 
3 S1-1-METAPH-006 S1-1 1 
4 S1-1-METAPH-007 S1-1 1 
5 S1-1-METAPH-008 S1-1 1 
6 S1-1-METAPH-019 S1-1 1 
7 S1-1-METAPH-027 S1-1 1 
8 S1-1-METAPH-089 S1-1 1 
9 S3-1-METAPH-002 S3-1 2 
10 S3-1-METAPH-003 S3-1 2 
11 S3-1-METAPH-006 S3-1 2 
12 S3-1-METAPH-007 S3-1 2 
13 S3-1-METAPH-008 S3-1 2 
14 S3-1-METAPH-019 S3-1 2 
15 S3-1-METAPH-027 S3-1 2 
16 S3-1-METAPH-089 S3-1 2 
17 S5-1-METAPH-002 S5-1 3 
18 S5-1-METAPH-003 S5-1 3 
19 S5-1-METAPH-006 S5-1 3 
20 S5-1-METAPH-007 S5-1 3 
21 S5-1-METAPH-008 S5-1 3 
22 S5-1-METAPH-019 S5-1 3 
23 S5-1-METAPH-027 S5-1 3 
24 S5-1-METAPH-089 S5-1 3 
Job04
Quartz | Level 8

I didn't well explain what I needed. As you put it: S[digit]-[number]: 

I needed only to keep the "number".  I have another column in the dataset to control when measurement was taken. This is longitudinal study  so I should have 3 identical ID number per subjet. 

 

 

maguiremq
SAS Super FREQ

Alright, I misunderstood your question. Do you mind replacing the `.` in the id field in this DATALINES statement with exactly what you want?

data want;
input subject :$20. id;
datalines;
S1-1-METAPH-002 .
S1-1-METAPH-019 .
S1-1-METAPH-089 .
S1-1-METAPH-027 .
S1-1-METAPH-006 .
S1-1-METAPH-007 .
S1-1-METAPH-008 .
S1-1-METAPH-003 .
S3-1-METAPH-027 .
S3-1-METAPH-008 .
S3-1-METAPH-089 .
S3-1-METAPH-003 .
S3-1-METAPH-002 .
S3-1-METAPH-006 .
S3-1-METAPH-007 .
S3-1-METAPH-019 .
S5-1-METAPH-002 .
S5-1-METAPH-003 .
S5-1-METAPH-027 .
S5-1-METAPH-089 .
S5-1-METAPH-006 .
S5-1-METAPH-019 .
S5-1-METAPH-008 .
S5-1-METAPH-007 .
;
run;

You can copy this code, and in your reply use the `insert SAS code` button and replace the `.` with your desired values.

 

This will help us figure out the solution.

Job04
Quartz | Level 8

Sorry for not explaining straight to the point what I needed.  It was so simple to say I want to keep the last 3 digits. 

s_lassen
Meteorite | Level 14

If I understand you correctly, you want to get the last 3 digits from the subject variable, which identifies the patient. One way is this:

data want;
  set have;
  length id $3;
  id=scan(subject,-1,'-');
run;

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 4226 views
  • 1 like
  • 4 in conversation