Solved: Re: how to select subjects have all 3 visits

jing2000yr · Posted 11-25-2012 09:08 AM

Hello,everyone,

I am a greenhand and would appreciate your help! I have a sample from a cohort study,each subject is supposed to have 3 visits.But actually,some of them only have the first and the third,or the first and the second.Now I am going to select those who have all three visits(like the ID 6238).I tried but failed.Thank you very much for helping me this out.

Thanks a lot!

part of the data set:

Obs ID visit

                          1      2448       1
                          2      2448       3
                          3      6238       1
                          4      6238       2
                          5      6238       3
                          6      9428       1
                          7      9428       2
                          8     10552       1
                          9     10552       2
                         10     11252       1
                         11     11252       2
                         12     11252       3
                         13     11263       1
                         14     11263       2
                         15     11263       3
                         16     12629       1
                         17     12629       2
                         18     12806       1
                         19     12806       2
                         20     12806       3

art297 · Posted 11-25-2012 09:40 AM

If you want to insure that each case not only has three records, but that they are for visits 1, 2 and 3, then I would use a double DOW. e.g.:

data have;

input ID visit;

cards;

2448 1

2448 3

6238 1

6238 2

6238 3

9428 1

9428 2

10552 1

10552 2

11252 1

11252 2

11252 3

11263 1

11263 2

11263 3

12629 1

12629 2

12806 1

12806 2

12806 3

;

proc sort data=have;

by id;

run;

data want (drop=check:);

do until (last.id);

set have;

by id;

array check(3);

check(visit)=1;

end;

do until (last.id);

set have;

by id;

if sum(of check(*)) eq 3 then output;

end;

run;

View solution in original post

art297 · Posted 11-25-2012 09:40 AM

If you want to insure that each case not only has three records, but that they are for visits 1, 2 and 3, then I would use a double DOW. e.g.:

data have;

input ID visit;

cards;

2448 1

2448 3

6238 1

6238 2

6238 3

9428 1

9428 2

10552 1

10552 2

11252 1

11252 2

11252 3

11263 1

11263 2

11263 3

12629 1

12629 2

12806 1

12806 2

12806 3

;

proc sort data=have;

by id;

run;

data want (drop=check:);

do until (last.id);

set have;

by id;

array check(3);

check(visit)=1;

end;

do until (last.id);

set have;

by id;

if sum(of check(*)) eq 3 then output;

end;

run;

jing2000yr · Posted 11-25-2012 10:12 AM

Dear Dr.Arthur,

I tried with small sample,it works.Thanks!

But My whole sample is like more than 4000 subjects and more than 10000 observations,it takes very long time.

Jing

jing2000yr · Posted 11-25-2012 10:16 AM

And I try to assign a new variable 'f' to indicate if all three visits then f=1,else f=0,then I want to output f=1.But keep failing...

Jing

Linlin · Posted 11-25-2012 11:03 AM

HI Jing,

Is the code below helpful?

data have;

input ID visit;

cards;

2448 1

2448 3

6238 1

6238 2

6238 3

9428 1

9428 2

10552 1

10552 2

11252 1

11252 2

11252 3

11263 1

11263 2

11263 3

12629 1

12629 2

12806 1

12806 2

12806 3

;

proc sql;

create table temp as

select id,count(id) as t

from have

group by id

having t>2;

create table want as

select id,visit, 1 as f

from have where id in (select id from temp);

quit;

proc print data=want;run;

Good luck!

Linlin

jing2000yr · Posted 11-25-2012 11:12 AM

Dear Linlin,

I have not learned the procedure SQL.Do you have other idea?

Thanks.

Jing

Linlin · Posted 11-25-2012 11:44 AM

Hi Jing,

The code Art provided is very efficient. Maybe my code is easier for you to understand.

data have;

input ID visit;

cards;

2448 1

2448 3

6238 1

6238 2

6238 3

9428 1

9428 2

10552 1

10552 2

11252 1

11252 2

11252 3

11263 1

11263 2

11263 3

12629 1

12629 2

12806 1

12806 2

12806 3

;

/* output a dataset with IDs having three visits */

data temp;

set have;

by id notsorted;

if first.id then count=0;

count+1;

if count=3;

data want;/*(assume your dataset already sorted)*/

merge have(in=a) temp(in=b keep=id);

by id;

if a and b;

proc print;run;

Linlin

jing2000yr · Posted 11-25-2012 01:33 PM

Dear Lin,

I need to learn Dr.Arthur's coding.The try yours as another method.Then I can learn both.

Thanks a lot.

Jing

jing2000yr · Posted 11-25-2012 12:50 PM

Dear Dr.Arthur,

This is my coding.randid=id,period=visit,I try your coding in this small sample,it shows what I want.But when I ran in whole sample with many other variables,it takes long time.Would you please help me checking if there are problems after I revise your coding in my pgm.If it is correct,maybe I just need to wait...

Thanks.

Jing

proc sort data=fr;
by randid period;
run;
data fr1;
set fr;
by randid period;
do until (last.randid);
array check(3);
check(period)=1;
end;
do until (last.randid);
set fr;
by randid;
if sum(of check(*)) eq 3 then output;
end;
run;

proc print data=fr1(ods=20);
run;

art297 · Posted 11-25-2012 11:32 AM

Jing,

I have to think that you entered something incorrectly. In the following code, I expanded your example data to represent 20,000 observations. On my computer it took 0.04 seconds to create the file, 0.04 seconds to sort it, and 0.04 seconds to run through the data step with the double DOW loop.

It would help if you posted the code that you actually ran.

TomKari · Posted 11-25-2012 12:13 PM

I agree with Art; given the characteristics that you provide, there should absolutely be no performance problems.

Please post your code, and the error will probably be apparent.

Tom

art297 · Posted 11-25-2012 12:51 PM

Your last post confuses me. As I understood your original post, you wanted one file that only contained the records that met the condition.

The following creates two files, the original one, plus a version of the raw data with the extra field added:

data want (drop=f check:) rawdatawithextrafield (drop=check:);

do until (last.id);

set have;

by id;

array check(3);

check(visit)=1;

end;

do until (last.id);

set have;

by id;

if sum(of check(*)) eq 3 then do;

f=1;

output want;

end;

else f=0;

output rawdatawithextrafield;

end;

run;

jing2000yr · Posted 11-25-2012 01:32 PM

Dear Dr,Arthur,

this works.

I need to think your coding.

Thank you very much!

Jing

art297 · Posted 11-25-2012 02:45 PM

It is a somewhat difficult concept to pick up early on in your learning curve. However, that said, the code is taking control away from the way SAS normally works.

The two do until loops read each record within a by group. Thus, the first loop reads the records:

2448 1

2448 3

and creates the array, check, with the values:

check1 1

check2 .

check3 1

Then, the second do loop again reads those same two records, but this time knows the values in the array check. Since the sum of check will only equal 3 if there were at least three records for id 2448 and, between those records, contained values 1, 2 and 3 for visit.

Then the first loop reads the records for the next id, etc., etc.

Registration is open

SAS Training: Just a Click Away