BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jcooper3
Calcite | Level 5

I currently have a dataset that looks like this:

pat_idtimesequence
11M
12MS
13M
14M
15M
16M
17M
18M
19M
110M
21M
22MST
23ST
24MT
25M
26MS
27S
28S
29S
210S

I am trying to make it so that whenever there is a repeat observation from line to line it only records the first observation of that value. I'm looking speciifcally at the sequence variable. For example, I would like to keep only the first three rows of data for patient 1, so I would like to basically get rid of all the 'M' observations for time 4-10.

Ultimately I would like to transpose the data and make a string variable, so I'd like to get my final dataset to look like this, but I'm hung up on the above part....

pat_idsequence
1M-MS-M
2M-MST-ST-MT-M-MS-S

Any thoughts? Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

Use the notsorted option on your by statement and append the first time you see a new sequence. Output on the last patient ID.

data have;

input pat_id    time    sequence $;

cards;

1    1    M

1    2    MS

1    3    M

1    4    M

1    5    M

1    6    M

1    7    M

1    8    M

1    9    M

1    10    M

2    1    M

2    2    MST

2    3    ST

2    4    MT

2    5    M

2    6    MS

2    7    S

2    8    S

2    9    S

2    10    S

;

run;

data want;

set have;

by pat_id sequence notsorted;

length full_sequence $200.;

retain full_sequence;

if first.pat_id then full_sequence='';

if first.sequence then full_sequence=catx("-", full_sequence, sequence);

if last.pat_id then output;

keep pat_id full_sequence;

run;

View solution in original post

4 REPLIES 4
sashelpwanted
Calcite | Level 5

So do you need the time field at all?  If not then you can do this very simple by doing a nodupkey:

proc sort data=yourfile nodupkey out=yourfile2;

     by pat_id sequence;

run;

If you need to keep that time then you could do something like this:

proc sort data=yourfile;

     by sequence pat_id time;

data yourfile2;

     set yourfile;

     by sequence pat_id time;

     if first.pat_id;

run;

jcooper3
Calcite | Level 5

thanks, sashelpwanted!

I don't really care about the exact time (ie. what exact time value), but I'm trying to keep the order. So if you look at patient 1, when I try to nodupkey option, I lose the piece when the patient starts taking 'M' again, meaning I lose what happens during time 3-10. When I try the other option you recommend, it only gives me the first observation per patient. Ideally, I would like the first observation whenever there's a change in the sequence, if that makes any sense. Thank you.

Reeza
Super User

Use the notsorted option on your by statement and append the first time you see a new sequence. Output on the last patient ID.

data have;

input pat_id    time    sequence $;

cards;

1    1    M

1    2    MS

1    3    M

1    4    M

1    5    M

1    6    M

1    7    M

1    8    M

1    9    M

1    10    M

2    1    M

2    2    MST

2    3    ST

2    4    MT

2    5    M

2    6    MS

2    7    S

2    8    S

2    9    S

2    10    S

;

run;

data want;

set have;

by pat_id sequence notsorted;

length full_sequence $200.;

retain full_sequence;

if first.pat_id then full_sequence='';

if first.sequence then full_sequence=catx("-", full_sequence, sequence);

if last.pat_id then output;

keep pat_id full_sequence;

run;

jcooper3
Calcite | Level 5

thank you, Reeza! This worked perfectly. much appreciated! Smiley Happy

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 883 views
  • 3 likes
  • 3 in conversation