BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
PsycResearcher
Calcite | Level 5

Hi all,

I have a question about selecting certain rows of data. The data involve people who have come several times within a period for repeated treatments. Each time they come they were tested several times, ranging from 1 to 10 times. The data is structured as follow:

participantID

Treatment_Start_Date

Assessment_Date

Scores

1

13JAN2001

13JAN2001

5

1

13JAN2001

24MAR2001

6

1

13JAN2001

07MAY2001

8

1

15DEC2001

15DEC2001

9

2

01FEB2008

01FEB2008

5

2

01FEB2008

15MAY2008

2

2

01FEB2008

06JAN2009

1

2

15DEC2009

15DEC2009

3

2

15DEC2009

15JAN2010

5

2

26MAY2010

26MAY2010

4

For participant # 1, for example, he received two sets of treatments in different periods: one period starting at 13JAN2001, and the second period starting at 15DEC2001. During the first period, he received assessment concerning the effectivess of treatments on 13Jan2001, 24Mar2001, and 7May2001. The scores of his treatment are 5, 6, and 8 respectively. The second set of treatment is on 15DEC2001, and he was assessed on 15DEC2001.

My question is, if I am only interested in the first set of treatment for each participant, is there a way I can retain only the first set of treatment and delete the second, third, fourth sets of treatment (i.e., retaining only 13Jan2001 for the first participant, 01Feb2008 for the second participant, and so on)?

Thanks,

Chester

1 ACCEPTED SOLUTION

Accepted Solutions
art297
Opal | Level 21

I think something like the following will accomplish what you want to do:

data have;

  informat Treatment_Start_Date Assessment_Date date9.;

  format Treatment_Start_Date Assessment_Date date9.;

  input participantID Treatment_Start_Date Assessment_Date Scores;

  cards;

1 13JAN2001 13JAN2001 5

1 13JAN2001 24MAR2001 6

1 13JAN2001 07MAY2001 8

1 15DEC2001 15DEC2001 9

2 01FEB2008 01FEB2008 5

2 01FEB2008 15MAY2008 2

2 01FEB2008 06JAN2009 1

2 15DEC2009 15DEC2009 3

2 15DEC2009 15JAN2010 5

2 26MAY2010 26MAY2010 4

;

proc sort data=have out=want;

  by participantID Treatment_Start_Date;

run;

data want (drop=start_date);

  set want;

  retain Start_Date;

  by participantID Treatment_Start_Date;

  if first.participantID then Start_Date=Treatment_Start_Date;

  if Start_Date eq Treatment_Start_Date then output;

run;

View solution in original post

6 REPLIES 6
art297
Opal | Level 21

I think something like the following will accomplish what you want to do:

data have;

  informat Treatment_Start_Date Assessment_Date date9.;

  format Treatment_Start_Date Assessment_Date date9.;

  input participantID Treatment_Start_Date Assessment_Date Scores;

  cards;

1 13JAN2001 13JAN2001 5

1 13JAN2001 24MAR2001 6

1 13JAN2001 07MAY2001 8

1 15DEC2001 15DEC2001 9

2 01FEB2008 01FEB2008 5

2 01FEB2008 15MAY2008 2

2 01FEB2008 06JAN2009 1

2 15DEC2009 15DEC2009 3

2 15DEC2009 15JAN2010 5

2 26MAY2010 26MAY2010 4

;

proc sort data=have out=want;

  by participantID Treatment_Start_Date;

run;

data want (drop=start_date);

  set want;

  retain Start_Date;

  by participantID Treatment_Start_Date;

  if first.participantID then Start_Date=Treatment_Start_Date;

  if Start_Date eq Treatment_Start_Date then output;

run;

PsycResearcher
Calcite | Level 5

Hi! I am using the syntax by art297 above.

i.e.,

--------------------------------------

proc sort data=have out=want;

  by participantID Treatment_Start_Date;

run;

data want (drop=start_date);

  set want;

  retain Start_Date;

  by participantID Treatment_Start_Date;

  if first.participantID then Start_Date=Treatment_Start_Date;

  if Start_Date eq Treatment_Start_Date then output;

run;

---------------------------------------

However, SAS gave me this error message:

ERROR: BY variables are not properly sorted on datset want.

Following this error message, the whole list of variables in the entire dataset is listed in the output window.

In the sample dataset that I gave at the top of this thread, I did not show that there is an entire list of other variables in the same dataset. I wonder what cause the error message in the dataset?

Thanks again!

Chester

art297
Opal | Level 21

Check your log.  With out seeing it, I would guess that the proc sort didn't run correctly because there was a misspelling in one of the fields.

MikeZdeb
Rhodochrosite | Level 12

Hi ... you can use Art's data and try PROC SQL ...

proc sql;

create table want as

select * from have

group by participantID

having Treatment_Start_Date eq min(Treatment_Start_Date);

quit;

Ksharp
Super User

How about:

data have;
  informat Treatment_Start_Date Assessment_Date date9.;
  format Treatment_Start_Date Assessment_Date date9.;
  input participantID Treatment_Start_Date Assessment_Date Scores;
  cards;
1 13JAN2001 13JAN2001 5
1 13JAN2001 24MAR2001 6
1 13JAN2001 07MAY2001 8
1 15DEC2001 15DEC2001 9
2 01FEB2008 01FEB2008 5
2 01FEB2008 15MAY2008 2
2 01FEB2008 06JAN2009 1
2 15DEC2009 15DEC2009 3
2 15DEC2009 15JAN2010 5
2 26MAY2010 26MAY2010 4
;
run;
data want(where=(count eq 1));
 set have;
 by participantID Treatment_Start_Date notsorted;
 if first.participantID then count=0;
 if first.Treatment_Start_Date then count+1;
run;

Ksharp

PsycResearcher
Calcite | Level 5

You guys are great! I used to use SPSS a lot and now found that SAS is more powerful in dealing with an extremely large and complicated dataset.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1899 views
  • 6 likes
  • 4 in conversation