BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jing2000yr
Calcite | Level 5

Hello,everyone,

I am a greenhand and would appreciate your help! I have a sample from a cohort study,each subject is supposed to have 3 visits.But actually,some of them only have the first and the third,or the first and the second.Now I am going to select those who have all three visits(like the ID 6238).I tried but failed.Thank you very much for helping me this out.

Thanks a lot!

part of the data set:

                    Obs        ID        visit

                          1      2448       1
                          2      2448       3
                          3      6238       1
                          4      6238       2
                          5      6238       3
                          6      9428       1
                          7      9428       2
                          8     10552       1
                          9     10552       2
                         10     11252       1
                         11     11252       2
                         12     11252       3
                         13     11263       1
                         14     11263       2
                         15     11263       3
                         16     12629       1
                         17     12629       2
                         18     12806       1
                         19     12806       2
                         20     12806       3

1 ACCEPTED SOLUTION

Accepted Solutions
art297
Opal | Level 21

If you want to insure that each case not only has three records, but that they are for visits 1, 2 and 3, then I would use a double DOW.  e.g.:

data have;

  input ID visit;

  cards;

2448       1

2448       3

6238       1

6238       2

6238       3

9428       1

9428       2

10552       1

10552       2

11252       1

11252       2

11252       3

11263       1

11263       2

11263       3

12629       1

12629       2

12806       1

12806       2

12806       3

;

proc sort data=have;

  by id;

run;

data want (drop=check:);

  do until (last.id);

    set have;

    by id;

    array check(3);

    check(visit)=1;

  end;

  do until (last.id);

    set have;

    by id;

    if sum(of check(*)) eq 3 then output;

  end;

run;

View solution in original post

13 REPLIES 13
art297
Opal | Level 21

If you want to insure that each case not only has three records, but that they are for visits 1, 2 and 3, then I would use a double DOW.  e.g.:

data have;

  input ID visit;

  cards;

2448       1

2448       3

6238       1

6238       2

6238       3

9428       1

9428       2

10552       1

10552       2

11252       1

11252       2

11252       3

11263       1

11263       2

11263       3

12629       1

12629       2

12806       1

12806       2

12806       3

;

proc sort data=have;

  by id;

run;

data want (drop=check:);

  do until (last.id);

    set have;

    by id;

    array check(3);

    check(visit)=1;

  end;

  do until (last.id);

    set have;

    by id;

    if sum(of check(*)) eq 3 then output;

  end;

run;

jing2000yr
Calcite | Level 5

Dear Dr.Arthur,

I tried with small sample,it works.Thanks!

But My whole sample is like more than 4000 subjects and more than 10000 observations,it takes very long time.

Jing

jing2000yr
Calcite | Level 5

And I try to assign a new variable 'f' to indicate if all three visits then f=1,else f=0,then I want to output f=1.But keep failing...

Jing

Linlin
Lapis Lazuli | Level 10

HI Jing,

Is the code below helpful?

data have;

  input ID visit;

  cards;

2448       1

2448       3

6238       1

6238       2

6238       3

9428       1

9428       2

10552       1

10552       2

11252       1

11252       2

11252       3

11263       1

11263       2

11263       3

12629       1

12629       2

12806       1

12806       2

12806       3

;

proc sql;

  create table temp as

    select id,count(id) as t

  from have

  group by id

   having t>2;

  create table want as

    select id,visit, 1 as f

   from have where id in (select id from temp);

  quit;

  proc print data=want;run;

Good luck!

Linlin

jing2000yr
Calcite | Level 5

Dear Linlin,

I have not learned the procedure SQL.Do you have other idea?

Thanks.

Jing

Linlin
Lapis Lazuli | Level 10

Hi Jing,

The code Art provided is very efficient. Maybe my code is easier for you to understand.

data have;

  input ID visit;

  cards;

2448       1

2448       3

6238       1

6238       2

6238       3

9428       1

9428       2

10552       1

10552       2

11252       1

11252       2

11252       3

11263       1

11263       2

11263       3

12629       1

12629       2

12806       1

12806       2

12806       3

;

/* output a dataset with IDs having three visits */

data temp;

  set have;

  by id notsorted;

  if first.id then count=0;

  count+1;

  if count=3;

data want;/*(assume your dataset already sorted)*/

merge have(in=a) temp(in=b keep=id);

by id;

if a and b;

proc print;run;

Linlin

jing2000yr
Calcite | Level 5

Dear Lin,

I need to learn Dr.Arthur's coding.The try yours as another method.Then I can learn both.

Thanks a lot.

Jing

jing2000yr
Calcite | Level 5

Dear Dr.Arthur,

This is my coding.randid=id,period=visit,I try your coding in this small sample,it shows what I want.But when I ran in whole sample with many other variables,it takes long time.Would you please help me checking if there are problems after I revise your coding in my pgm.If it is correct,maybe I just need to wait...

Thanks.

Jing

proc sort data=fr;
by randid period;
run;
data fr1;
set fr;
by randid period;
do until (last.randid);
array check(3);
check(period)=1;
end;
do until (last.randid);
set fr;
by randid;
if sum(of check(*)) eq 3 then output;
end;
run;

proc print data=fr1(ods=20);
run;

art297
Opal | Level 21

Jing,

I have to think that you entered something incorrectly.  In the following code, I expanded your example data to represent 20,000 observations.  On my computer it took 0.04 seconds to create the file, 0.04 seconds to sort it, and 0.04 seconds to run through the data step with the double DOW loop.

It would help if you posted the code that you actually ran.

TomKari
Onyx | Level 15

I agree with Art; given the characteristics that you provide, there should absolutely be no performance problems.

Please post your code, and the error will probably be apparent.

Tom

art297
Opal | Level 21

Your last post confuses me.  As I understood your original post, you wanted one file that only contained the records that met the condition.

The following creates two files, the original one, plus a version of the raw data with the extra field added:

data want (drop=f check:) rawdatawithextrafield (drop=check:);

  do until (last.id);

    set have;

    by id;

    array check(3);

    check(visit)=1;

  end;

  do until (last.id);

    set have;

    by id;

    if sum(of check(*)) eq 3 then do;

      f=1;

      output want;

    end;

    else f=0;

    output rawdatawithextrafield;

  end;

run;

jing2000yr
Calcite | Level 5


Dear Dr,Arthur,

this works.

I need to think your coding.

Thank you very much!

Jing

art297
Opal | Level 21

It is a somewhat difficult concept to pick up early on in your learning curve.  However, that said, the code is taking control away from the way SAS normally works.

The two do until loops read each record within a by group.  Thus, the first loop reads the records:

2448       1

2448       3

and creates the array, check, with the values:

check1 1

check2 .

check3 1

Then, the second do loop again reads those same two records, but this time knows the values in the array check.  Since the sum of check will only equal 3 if there were at least three records for id 2448 and, between those records, contained values 1, 2 and 3 for visit.

Then the first loop reads the records for the next id, etc., etc.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 13 replies
  • 3507 views
  • 6 likes
  • 4 in conversation