BookmarkSubscribeRSS Feed
confooseddesi89
Quartz | Level 8

Hello,

 

I'm working on a dataset where participants wore a device for up to 7 consecutive days that measured their sleep and activity. Not all participants wore the device when they should have, as often occurs in human research, leading to some days being "invalidated" and not included in the dataset. Whether the participant's data will be included in analyses will depend on how many days of consecutive valid recording they produced. I want to determine: (1) the largest number of consecutive days of sleep/activity recording, per participant, and (2) whether a given row (day) is part of a consecutive set of 3 days within a given participant. Below is a sample of my data for the first two participants, 1000007 and 1000011:

 

idnum rest_enddate
100000710/26/2014
100000710/27/2014
100000710/28/2014
100000710/30/2014
100000710/31/2014
100000711/01/2014
100001109/01/2014
100001109/03/2014
100001109/04/2014
100001109/07/2014
100001109/08/2014

 

Using "rest_enddate" as a marker for each day of sleep recording, id number 1000007 above would have a value of 3 for "max_consec_days" as there are 2 sets of 3 consecutive days for this participant: 10/26/2014--10/27/2014--10/28/2014, and 10/30/2014--10/31/2014--11/1/2014 (the number of sets is not important for this variable). Meanwhile, each of these six days/rows would have a value of "yes" or "1" for "part_of_3_consec_days", as each of these days is part of a set of 3 consecutive days. ID number 1000011, however, would have a value of 2 for "max_consec_days" as there are 2 sets of 2 consecutive days for this participant: 9/3/2014--9/4/2014 and 9/7/2014--9/8/2014. The value of "part_of_3_consec_days" would be "no" or "0" for each day for this participant, as none of these days is part of a set of 3 consecutive days. (For the above participant examples, all days for each participant have the same value for "part_of_3_consec_days," but this will likely not be the case for many days for other participants.) Is this possible with SAS? Any help is appreciated.

 

Thanks,

5 REPLIES 5
Reeza
Super User

 

data want;
set have;
by idnum;

retain max_consecutive count;

prev_date = lag(rest_enddate);
if first.idnum then do;
max_consecutive = 0;
count=0;
call missing(prev_date);
end;
if rest_enddate - prev_date = 1 then count+1;
else do;
if count > max_consecutive then max_consecutive = count; 
count=0;
end;

if last.idnum then output;
run;


 

This should get you started.

 

 

Oligolas
Barite | Level 11

Hi,

 

yes it's possible. I also added a Group variable to differentiate the sets of consecutive days (if needed)

data have;
input idnum rest_enddate :mmddyy10.;
format rest_enddate is8601da.;
datalines;
1000007 10/26/2014 
1000007 10/27/2014 
1000007 10/28/2014 
1000007 10/30/2014 
1000007 10/31/2014 
1000007 11/01/2014 
1000011 09/01/2014 
1000011 09/03/2014 
1000011 09/04/2014 
1000011 09/07/2014 
1000011 09/08/2014 
;
run;


proc sort data=have out=_tmp0;by idnum rest_enddate;run;
data _tmp0;
   set _tmp0;
   by idnum rest_enddate;
   retain consday . md5group;
   lag1_endt=lag1(rest_enddate);
   if first.idnum then do;
      lag1_endt=.;
      consday=1;
   end;
   if not missing(lag1_endt) then do; 
      if rest_enddate eq lag1_endt+1 then consday=consday+1; 
      else do;
         consday=1;
         sleepDays=rest_enddate-lag1_endt-1;
      end;
   end;
   format lag1_endt is8601da.;

   lag1_consday=lag1(consday);
   if consday le lag1_consday then new=1;
   if _N_ eq 1 or new eq 1 then md5group = md5(put(idnum,best.)||put(rest_enddate,is8601da.)); 
run;

proc sort data=_tmp0 out=_tmp1;by idnum md5group descending rest_enddate;run;
data _tmp1;
   set _tmp1;
   by idnum md5group descending rest_enddate;
   retain part_of_3_consec_days .;
   if first.md5group then part_of_3_consec_days=consday;
run;

proc sort data=_tmp1 out=want;by idnum rest_enddate;run;
data want;
   set want;
   by idnum rest_enddate;
   part_of_3_consec_days=ifn(part_of_3_consec_days eq 3,1,0);
   drop consday lag1_: new;
run;

PROC DATASETS lib=work nolist; delete _tmp:; RUN;QUIT;
________________________

- Cheers -

RichardDeVen
Barite | Level 11

You can use a DOW loop to identify runs (runid) and sequence place within into an intermediate data set.   A subsequent nested SQL query with automatic remerge is used to flag conditions within id and runid.

 

Example:

data have;
input id date mmddyy10.;
format date mmddyy10.;
datalines;
1000007	10/26/2014
1000007	10/27/2014
1000007	10/28/2014
1000007	10/30/2014
1000007	10/31/2014
1000007	11/01/2014
1000011	09/01/2014
1000011	09/03/2014
1000011	09/04/2014
1000011	09/07/2014
1000011	09/08/2014
;

data stage(drop=pdate);
  do until (last.id);
    set have;
    by id;

    if missing(pdate) or date-pdate > 1 then do;
      runid + 1;
    end;

    pdate = date;

    if runid ne lag(runid) or first.id then 
      runseq=1;
    else
      runseq+1;

    output;
  end;
run;

proc sql;
  create table want as
  select 
    id_grouping.id
  , id_grouping.date
  , id_grouping.longest_run_of_id
  , max(runseq) >= 3 as date_is_in_a_3plus_run
/*  , id_grouping.runid*/
/*  , id_grouping.runseq*/
  from 
  ( select 
      id
    , date
    , runid
    , runseq
    , max(runseq) as longest_run_of_id
    from stage
    group by id
  ) id_grouping
  group by
    runid
  order by
    runid, runseq
  ;

Output:

RichardADeVenezia_0-1598891786011.png

 

PeterClemmensen
Tourmaline | Level 20

My 2 cents

 

data have;
input id date mmddyy10.;
format date mmddyy10.;
datalines;
1000007 10/26/2014
1000007 10/27/2014
1000007 10/28/2014
1000007 10/30/2014
1000007 10/31/2014
1000007 11/01/2014
1000011 09/01/2014
1000011 09/03/2014
1000011 09/04/2014
1000011 09/07/2014
1000011 09/08/2014
;

data t / view=t;
   set have;
   i = 0;
run;

data want(drop=c d rc);
   if _N_ = 1 then do;
       declare hash h(dataset:'t');
       h.definekey('id', 'date');
       h.definedata('i');
       h.definedone();
   end;

   do _N_ = 1 by 1 until (last.id);
      set t;
      by id;
      c = 0;
    
      do d = -2 to 2;
         rc = h.check(key : id, key : date + d);
         if rc then c = 0;
         c = sum(c, rc = 0);
         if c = 3 then h.replace(key : id, key : date, data : 1);
         longest_run = max(longest_run, c);
      end;
   end;

   do _N_ = 1 to _N_;
      set t;
      rc = h.find();
      output;
   end;
 
run;

Result:

 

id       date        i  longest_run 
1000007  10/26/2014  1  3 
1000007  10/27/2014  1  3 
1000007  10/28/2014  1  3 
1000007  10/30/2014  1  3 
1000007  10/31/2014  1  3 
1000007  11/01/2014  1  3 
1000011  09/01/2014  0  2 
1000011  09/03/2014  0  2 
1000011  09/04/2014  0  2 
1000011  09/07/2014  0  2 
1000011  09/08/2014  0  2 
PGStats
Opal | Level 21

A method using an array:

 

data have;
input id date :mmddyy10.;
format date yymmdd10.;
datalines;
1000007   10/26/2014
1000007   10/27/2014
1000007   10/28/2014
1000007   10/30/2014
1000007   10/31/2014
1000007   11/01/2014
1000011   09/01/2014
1000011   09/03/2014
1000011   09/04/2014
1000011   09/07/2014
1000011   09/08/2014
;

data want;
array _d{99};
do until(last.id);
    set have; by id;
    if first.id then refDate = date-1;
    _d{date - refDate} = 1;
    end;
c = 0;
do i = 1 to dim(_d);
    if _d{i} then do;
        c = c + 1;
        do j = i to i - c + 1 by -1;
            _d{j} = c;
            end;
        maxConsec = max(maxConsec, c);
        end;
    else c = 0;
    end;
do i = 1 to dim(_d);
    if _d{i} then do;
        in_3_seq = _d{i} >= 3;
        date = refDate + i;
        output;
        end;
    end;
drop _d: refDate c i j;
run;

proc print data=want noobs; run;
id 	date 	maxConsec 	in_3_seq
1000007 	2014-10-26 	3 	1
1000007 	2014-10-27 	3 	1
1000007 	2014-10-28 	3 	1
1000007 	2014-10-30 	3 	1
1000007 	2014-10-31 	3 	1
1000007 	2014-11-01 	3 	1
1000011 	2014-09-01 	2 	0
1000011 	2014-09-03 	2 	0
1000011 	2014-09-04 	2 	0
1000011 	2014-09-07 	2 	0
1000011 	2014-09-08 	2 	0
PG

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 719 views
  • 2 likes
  • 6 in conversation