A question about selecting certain rows of data

Solved
Occasional Contributor
Posts: 14

A question about selecting certain rows of data

Hi all,

I have a question about selecting certain rows of data. The data involve people who have come several times within a period for repeated treatments. Each time they come they were tested several times, ranging from 1 to 10 times. The data is structured as follow:

 participantID Treatment_Start_Date Assessment_Date Scores 1 13JAN2001 13JAN2001 5 1 13JAN2001 24MAR2001 6 1 13JAN2001 07MAY2001 8 1 15DEC2001 15DEC2001 9 2 01FEB2008 01FEB2008 5 2 01FEB2008 15MAY2008 2 2 01FEB2008 06JAN2009 1 2 15DEC2009 15DEC2009 3 2 15DEC2009 15JAN2010 5 2 26MAY2010 26MAY2010 4

For participant # 1, for example, he received two sets of treatments in different periods: one period starting at 13JAN2001, and the second period starting at 15DEC2001. During the first period, he received assessment concerning the effectivess of treatments on 13Jan2001, 24Mar2001, and 7May2001. The scores of his treatment are 5, 6, and 8 respectively. The second set of treatment is on 15DEC2001, and he was assessed on 15DEC2001.

My question is, if I am only interested in the first set of treatment for each participant, is there a way I can retain only the first set of treatment and delete the second, third, fourth sets of treatment (i.e., retaining only 13Jan2001 for the first participant, 01Feb2008 for the second participant, and so on)?

Thanks,

Chester

Accepted Solutions
Solution
‎08-03-2011 11:35 AM
PROC Star
Posts: 8,164

A question about selecting certain rows of data

Posted in reply to PsycResearcher

I think something like the following will accomplish what you want to do:

data have;

informat Treatment_Start_Date Assessment_Date date9.;

format Treatment_Start_Date Assessment_Date date9.;

input participantID Treatment_Start_Date Assessment_Date Scores;

cards;

1 13JAN2001 13JAN2001 5

1 13JAN2001 24MAR2001 6

1 13JAN2001 07MAY2001 8

1 15DEC2001 15DEC2001 9

2 01FEB2008 01FEB2008 5

2 01FEB2008 15MAY2008 2

2 01FEB2008 06JAN2009 1

2 15DEC2009 15DEC2009 3

2 15DEC2009 15JAN2010 5

2 26MAY2010 26MAY2010 4

;

proc sort data=have out=want;

by participantID Treatment_Start_Date;

run;

data want (drop=start_date);

set want;

retain Start_Date;

by participantID Treatment_Start_Date;

if first.participantID then Start_Date=Treatment_Start_Date;

if Start_Date eq Treatment_Start_Date then output;

run;

All Replies
Solution
‎08-03-2011 11:35 AM
PROC Star
Posts: 8,164

A question about selecting certain rows of data

Posted in reply to PsycResearcher

I think something like the following will accomplish what you want to do:

data have;

informat Treatment_Start_Date Assessment_Date date9.;

format Treatment_Start_Date Assessment_Date date9.;

input participantID Treatment_Start_Date Assessment_Date Scores;

cards;

1 13JAN2001 13JAN2001 5

1 13JAN2001 24MAR2001 6

1 13JAN2001 07MAY2001 8

1 15DEC2001 15DEC2001 9

2 01FEB2008 01FEB2008 5

2 01FEB2008 15MAY2008 2

2 01FEB2008 06JAN2009 1

2 15DEC2009 15DEC2009 3

2 15DEC2009 15JAN2010 5

2 26MAY2010 26MAY2010 4

;

proc sort data=have out=want;

by participantID Treatment_Start_Date;

run;

data want (drop=start_date);

set want;

retain Start_Date;

by participantID Treatment_Start_Date;

if first.participantID then Start_Date=Treatment_Start_Date;

if Start_Date eq Treatment_Start_Date then output;

run;

Occasional Contributor
Posts: 14

Re: A question about selecting certain rows of data

Hi! I am using the syntax by art297 above.

i.e.,

--------------------------------------

proc sort data=have out=want;

by participantID Treatment_Start_Date;

run;

data want (drop=start_date);

set want;

retain Start_Date;

by participantID Treatment_Start_Date;

if first.participantID then Start_Date=Treatment_Start_Date;

if Start_Date eq Treatment_Start_Date then output;

run;

---------------------------------------

However, SAS gave me this error message:

ERROR: BY variables are not properly sorted on datset want.

Following this error message, the whole list of variables in the entire dataset is listed in the output window.

In the sample dataset that I gave at the top of this thread, I did not show that there is an entire list of other variables in the same dataset. I wonder what cause the error message in the dataset?

Thanks again!

Chester

PROC Star
Posts: 8,164

Re: A question about selecting certain rows of data

Posted in reply to PsycResearcher

Check your log.  With out seeing it, I would guess that the proc sort didn't run correctly because there was a misspelling in one of the fields.

Valued Guide
Posts: 765

Re: A question about selecting certain rows of data

Posted in reply to PsycResearcher

Hi ... you can use Art's data and try PROC SQL ...

proc sql;

create table want as

select * from have

group by participantID

having Treatment_Start_Date eq min(Treatment_Start_Date);

quit;

Super User
Posts: 10,784

Re: A question about selecting certain rows of data

Posted in reply to PsycResearcher

How about:

```data have;
informat Treatment_Start_Date Assessment_Date date9.;
format Treatment_Start_Date Assessment_Date date9.;
input participantID Treatment_Start_Date Assessment_Date Scores;
cards;
1 13JAN2001 13JAN2001 5
1 13JAN2001 24MAR2001 6
1 13JAN2001 07MAY2001 8
1 15DEC2001 15DEC2001 9
2 01FEB2008 01FEB2008 5
2 01FEB2008 15MAY2008 2
2 01FEB2008 06JAN2009 1
2 15DEC2009 15DEC2009 3
2 15DEC2009 15JAN2010 5
2 26MAY2010 26MAY2010 4
;
run;
data want(where=(count eq 1));
set have;
by participantID Treatment_Start_Date notsorted;
if first.participantID then count=0;
if first.Treatment_Start_Date then count+1;
run;
```

Ksharp

Occasional Contributor
Posts: 14

Re: A question about selecting certain rows of data

You guys are great! I used to use SPSS a lot and now found that SAS is more powerful in dealing with an extremely large and complicated dataset.

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
• 6 replies
• 150 views
• 6 likes
• 4 in conversation