Hello,everyone,
I am a greenhand and would appreciate your help! I have a sample from a cohort study,each subject is supposed to have 3 visits.But actually,some of them only have the first and the third,or the first and the second.Now I am going to select those who have all three visits(like the ID 6238).I tried but failed.Thank you very much for helping me this out.
Thanks a lot!
part of the data set:
Obs ID visit
1 2448 1
2 2448 3
3 6238 1
4 6238 2
5 6238 3
6 9428 1
7 9428 2
8 10552 1
9 10552 2
10 11252 1
11 11252 2
12 11252 3
13 11263 1
14 11263 2
15 11263 3
16 12629 1
17 12629 2
18 12806 1
19 12806 2
20 12806 3
If you want to insure that each case not only has three records, but that they are for visits 1, 2 and 3, then I would use a double DOW. e.g.:
data have;
input ID visit;
cards;
2448 1
2448 3
6238 1
6238 2
6238 3
9428 1
9428 2
10552 1
10552 2
11252 1
11252 2
11252 3
11263 1
11263 2
11263 3
12629 1
12629 2
12806 1
12806 2
12806 3
;
proc sort data=have;
by id;
run;
data want (drop=check:);
do until (last.id);
set have;
by id;
array check(3);
check(visit)=1;
end;
do until (last.id);
set have;
by id;
if sum(of check(*)) eq 3 then output;
end;
run;
If you want to insure that each case not only has three records, but that they are for visits 1, 2 and 3, then I would use a double DOW. e.g.:
data have;
input ID visit;
cards;
2448 1
2448 3
6238 1
6238 2
6238 3
9428 1
9428 2
10552 1
10552 2
11252 1
11252 2
11252 3
11263 1
11263 2
11263 3
12629 1
12629 2
12806 1
12806 2
12806 3
;
proc sort data=have;
by id;
run;
data want (drop=check:);
do until (last.id);
set have;
by id;
array check(3);
check(visit)=1;
end;
do until (last.id);
set have;
by id;
if sum(of check(*)) eq 3 then output;
end;
run;
Dear Dr.Arthur,
I tried with small sample,it works.Thanks!
But My whole sample is like more than 4000 subjects and more than 10000 observations,it takes very long time.
Jing
And I try to assign a new variable 'f' to indicate if all three visits then f=1,else f=0,then I want to output f=1.But keep failing...
Jing
HI Jing,
Is the code below helpful?
data have;
input ID visit;
cards;
2448 1
2448 3
6238 1
6238 2
6238 3
9428 1
9428 2
10552 1
10552 2
11252 1
11252 2
11252 3
11263 1
11263 2
11263 3
12629 1
12629 2
12806 1
12806 2
12806 3
;
proc sql;
create table temp as
select id,count(id) as t
from have
group by id
having t>2;
create table want as
select id,visit, 1 as f
from have where id in (select id from temp);
quit;
proc print data=want;run;
Good luck!
Linlin
Dear Linlin,
I have not learned the procedure SQL.Do you have other idea?
Thanks.
Jing
Hi Jing,
The code Art provided is very efficient. Maybe my code is easier for you to understand.
data have;
input ID visit;
cards;
2448 1
2448 3
6238 1
6238 2
6238 3
9428 1
9428 2
10552 1
10552 2
11252 1
11252 2
11252 3
11263 1
11263 2
11263 3
12629 1
12629 2
12806 1
12806 2
12806 3
;
/* output a dataset with IDs having three visits */
data temp;
set have;
by id notsorted;
if first.id then count=0;
count+1;
if count=3;
data want;/*(assume your dataset already sorted)*/
merge have(in=a) temp(in=b keep=id);
by id;
if a and b;
proc print;run;
Linlin
Dear Lin,
I need to learn Dr.Arthur's coding.The try yours as another method.Then I can learn both.
Thanks a lot.
Jing
Dear Dr.Arthur,
This is my coding.randid=id,period=visit,I try your coding in this small sample,it shows what I want.But when I ran in whole sample with many other variables,it takes long time.Would you please help me checking if there are problems after I revise your coding in my pgm.If it is correct,maybe I just need to wait...
Thanks.
Jing
proc sort data=fr;
by randid period;
run;
data fr1;
set fr;
by randid period;
do until (last.randid);
array check(3);
check(period)=1;
end;
do until (last.randid);
set fr;
by randid;
if sum(of check(*)) eq 3 then output;
end;
run;
proc print data=fr1(ods=20);
run;
Jing,
I have to think that you entered something incorrectly. In the following code, I expanded your example data to represent 20,000 observations. On my computer it took 0.04 seconds to create the file, 0.04 seconds to sort it, and 0.04 seconds to run through the data step with the double DOW loop.
It would help if you posted the code that you actually ran.
I agree with Art; given the characteristics that you provide, there should absolutely be no performance problems.
Please post your code, and the error will probably be apparent.
Tom
Your last post confuses me. As I understood your original post, you wanted one file that only contained the records that met the condition.
The following creates two files, the original one, plus a version of the raw data with the extra field added:
data want (drop=f check:) rawdatawithextrafield (drop=check:);
do until (last.id);
set have;
by id;
array check(3);
check(visit)=1;
end;
do until (last.id);
set have;
by id;
if sum(of check(*)) eq 3 then do;
f=1;
output want;
end;
else f=0;
output rawdatawithextrafield;
end;
run;
Dear Dr,Arthur,
this works.
I need to think your coding.
Thank you very much!
Jing
It is a somewhat difficult concept to pick up early on in your learning curve. However, that said, the code is taking control away from the way SAS normally works.
The two do until loops read each record within a by group. Thus, the first loop reads the records:
2448 1
2448 3
and creates the array, check, with the values:
check1 1
check2 .
check3 1
Then, the second do loop again reads those same two records, but this time knows the values in the array check. Since the sum of check will only equal 3 if there were at least three records for id 2448 and, between those records, contained values 1, 2 and 3 for visit.
Then the first loop reads the records for the next id, etc., etc.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.