multiple duplicate observations in a column of a char var.- want to condense this

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 12
Accepted Solution

multiple duplicate observations in a column of a char var.- want to condense this

I currently have a dataset that looks like this:

pat_idtimesequence
11M
12MS
13M
14M
15M
16M
17M
18M
19M
110M
21M
22MST
23ST
24MT
25M
26MS
27S
28S
29S
210S

I am trying to make it so that whenever there is a repeat observation from line to line it only records the first observation of that value. I'm looking speciifcally at the sequence variable. For example, I would like to keep only the first three rows of data for patient 1, so I would like to basically get rid of all the 'M' observations for time 4-10.

Ultimately I would like to transpose the data and make a string variable, so I'd like to get my final dataset to look like this, but I'm hung up on the above part....

pat_idsequence
1M-MS-M
2M-MST-ST-MT-M-MS-S

Any thoughts? Thank you!


Accepted Solutions
Solution
‎04-15-2014 03:43 PM
Super User
Posts: 17,819

Re: multiple duplicate observations in a column of a char var.- want to condense this

Use the notsorted option on your by statement and append the first time you see a new sequence. Output on the last patient ID.

data have;

input pat_id    time    sequence $;

cards;

1    1    M

1    2    MS

1    3    M

1    4    M

1    5    M

1    6    M

1    7    M

1    8    M

1    9    M

1    10    M

2    1    M

2    2    MST

2    3    ST

2    4    MT

2    5    M

2    6    MS

2    7    S

2    8    S

2    9    S

2    10    S

;

run;

data want;

set have;

by pat_id sequence notsorted;

length full_sequence $200.;

retain full_sequence;

if first.pat_id then full_sequence='';

if first.sequence then full_sequence=catx("-", full_sequence, sequence);

if last.pat_id then output;

keep pat_id full_sequence;

run;

View solution in original post


All Replies
Occasional Contributor
Posts: 5

Re: multiple duplicate observations in a column of a char var.- want to condense this

So do you need the time field at all?  If not then you can do this very simple by doing a nodupkey:

proc sort data=yourfile nodupkey out=yourfile2;

     by pat_id sequence;

run;

If you need to keep that time then you could do something like this:

proc sort data=yourfile;

     by sequence pat_id time;

data yourfile2;

     set yourfile;

     by sequence pat_id time;

     if first.pat_id;

run;

Occasional Contributor
Posts: 12

Re: multiple duplicate observations in a column of a char var.- want to condense this

thanks, sashelpwanted!

I don't really care about the exact time (ie. what exact time value), but I'm trying to keep the order. So if you look at patient 1, when I try to nodupkey option, I lose the piece when the patient starts taking 'M' again, meaning I lose what happens during time 3-10. When I try the other option you recommend, it only gives me the first observation per patient. Ideally, I would like the first observation whenever there's a change in the sequence, if that makes any sense. Thank you.

Solution
‎04-15-2014 03:43 PM
Super User
Posts: 17,819

Re: multiple duplicate observations in a column of a char var.- want to condense this

Use the notsorted option on your by statement and append the first time you see a new sequence. Output on the last patient ID.

data have;

input pat_id    time    sequence $;

cards;

1    1    M

1    2    MS

1    3    M

1    4    M

1    5    M

1    6    M

1    7    M

1    8    M

1    9    M

1    10    M

2    1    M

2    2    MST

2    3    ST

2    4    MT

2    5    M

2    6    MS

2    7    S

2    8    S

2    9    S

2    10    S

;

run;

data want;

set have;

by pat_id sequence notsorted;

length full_sequence $200.;

retain full_sequence;

if first.pat_id then full_sequence='';

if first.sequence then full_sequence=catx("-", full_sequence, sequence);

if last.pat_id then output;

keep pat_id full_sequence;

run;

Occasional Contributor
Posts: 12

Re: multiple duplicate observations in a column of a char var.- want to condense this

thank you, Reeza! This worked perfectly. much appreciated! Smiley Happy

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 255 views
  • 3 likes
  • 3 in conversation