Solved: Re: Counting observations within groups of observations that fulfill s...

USCSS_Nostromo · Posted 10-29-2019 09:39 AM

Dear SAS communities,

I am interested in counting observations within groups of observations in sets of human sleep data that fulfill specific criteria.

In the course of a night's sleep, an individual goes through various stages of sleep, which can be categorized as rapid eye movement (REM) sleep or non-rapid eye movement sleep (NREM) or occasionally wakefulness (WAKE). These stages typically occur in groups. I wish to count the number of instances (observations) of NREM sleep and WAKE which occur during groups of REM sleep only. If I could get a column with this information then that would be fine.

One exception: Should there be more than 15 observations of either NREM or WAKE within a group of REM sleep, then this would not be counted.

Can you help me?

The data set is attached.

I have two columns in the data set: ID and Type. The data (REM, NREM and WAKE) are in the Type column.

Please feel free to ask me anything.

I am running SAS 9.4

Sincerely,

Ian

mkeintz · Posted 10-30-2019 05:23 PM

I think the logic of this program is to read the data in batches by ID TYPE, generating the size of each run of each TYPE, call it a RUNSIZE. Keep generating sequential RUNSIZE values (updating the total REM, NREM, and WAKE record counts at the same time), until you encounter a total of trailing runsize for NREM and WAKE greater the 15. (A "trailing" count is just the number of the records types since the end of the last run of REM records:

data have;
  infile 'c:\temp\t.txt' truncover dlm='	';  /* using download of your data here*/
  recid=_n_;
  input ID :$6.  TYPE :$4. ;
run;


data want (keep=id group_id rem_beg rem_end rem_beg_r rem_end_r N_rem n_nrem n_wake);

  label GROUP_ID   = 'Group number (within ID) for this REM group'
        REM_BEG    = 'Record ID for first REM record in this group'
        REM_END    = 'Record ID for last REM record in this group'
        REM_BEG_R  = 'Relative record id within an ID for REM_BEG'
        REM_END_R  = 'Relative record id within an ID for REM_END'
        REM_END_R  = 'Within ID value for REM_BEG'
        N_REM      = 'Number of REM records in this group'
        N_NREM     = 'Number of NREM records in this group'
        N_WAKE     = 'Number of WAKE records in this group'    ;

  retain END_OF_PRIOR_ID 0;                  /* Track the ending RID (see below) for preceding ID */
  /* Iterate below until end of ID or a gap (TRAILING_NREM + TRAILING_WAKE) is large enough */
  do until (sum(trailing_nrem,trailing_wake,0)>15 or last.id=1); 
    do runsize=1 by 1 until (last.type);     /* Find runsize for the current TYPE */
      set have;
      by id  type notsorted;
      rid+1;                                 /* Track Record identifier */
    end;

    if type='REM' then do;                           /* If this was a run of REM records ...   */
      rem_end=rid;
      if rem_beg=. then do;                          /* If REM group just starting ...         */
        rem_beg= rid - runsize + 1;                  /*   ... initial REM_BEG ...              */
        n_nrem=0;                                    /*   ... and N_NREM ...                   */
        n_wake=0;                                    /*   ... and N_WAKE ...                   */
      end;

      trailing_nrem=0;                               /* Reset trailing NREM run size           */
      trailing_wake=0;
      N_rem+runsize;                                 /* Update total count of REM records      */
    end;
    else if type='NREM' then do;
      trailing_nrem+runsize;                         /* Size of all trailing NREM runs */
      N_NREM+runsize;
    end;
    else if type='WAKE' then do;
      trailing_wake+runsize;                         /* Size of all trailing WAKE runs */
      N_WAKE+runsize;
    end;
  end; 

  /* Now that a large gap or end-of-id has been encountered ... */
  if N_rem>= ifn(group_id>=1,15,2) then do;  /* If this is a qualifying REM group ...          */
    N_nrem = N_nrem - trailing_nrem;         /* ... Subtract count of trailing NREM records    */
    N_wake = N_wake - trailing_wake;         /* ... Same with any trailing WAKE records        */
    REM_BEG_R= rem_beg-end_of_prior_id;      /* ... Get the relative REM_BEG ...               */
    REM_END_R= rem_end-end_of_prior_id;      /* ... Get the relative REM_END ...               */
    group_id+1;                              /* ... Update the group identifier                */
    output;
  end;
  call missing(of rem_:, of N_:, of trailing_:);
  if last.id then do;
     call missing(group_id);
     end_of_prior_id=rid;
  end;

run;

So the outer loop is a do until a large trailing gapsize (NREM and WAKE records) exceed 15 or end of an ID group.

The inside loop reads records in batches, by type.

Once the gap or end-of-id is encountered, just inspect the N_REM (>=15 except for the first group), and subtract the number of trailing nrem and wake records.

The expression

ifn(group_id>=1,15,2)

Returns a 15 when group_id>=1 and returns a 2 otherwise. This is a handy way to set different count requirements for the first rem group vs all the remaining rem groups. Since group_id is increment only AFTER qualification is established, this is a handy way to implement differing limits. Note at the end of each id, the group_id variable is reset to missing, so that the next ID is properly initialized.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

mkeintz · Posted 10-29-2019 12:16 PM

@USCSS_Nostromo wrote:

Dear SAS communities,

I am interested in counting observations within groups of observations in sets of human sleep data that fulfill specific criteria.

In the course of a night's sleep, an individual goes through various stages of sleep, which can be categorized as rapid eye movement (REM) sleep or non-rapid eye movement sleep (NREM) or occasionally wakefulness (WAKE). These stages typically occur in groups. I wish to count the number of instances (observations) of NREM sleep and WAKE which occur during groups of REM sleep only. If I could get a column with this information then that would be fine.

One exception: Should there be more than 15 observations of either NREM or WAKE within a group of REM sleep, then this would not be counted.

I have two columns in the data set: ID and Type. The data (REM, NREM and WAKE) are in the Type column.

What is your definition of "groups of REM sleep"? Does the sequence of

REM/WAKE/WAKE/WAKE/WAKE/….(20 WAKEs)/REM

constitute a group of REM sleep?

Is yes, then does REM/WAKE/WAKE/REM/WAKE/WAKE/WAKE/WAKE/WAKE/REM constitute one REM sleep group or two?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

USCSS_Nostromo · Posted 10-30-2019 08:18 AM

Hello mkeintz,

Thank you for your reply.

The definition of a REM sleep group is a sequence of REM that is > 15 instances or more or REM. There can be intervening NREM or Wake without ending the REM sleep group as long as the contiguous intervening NREM or WAKE together are < 15 in number.

There is one exception to the above rules, however: the first group of REM sleep in the data can be any length of REM instances. It does not have to be > 15 in number. Here are some examples.

There are per data set that I have 4-5 REM sleep groups. The overarching aim is to see how fragmented a given individual's REM sleep is per night. The instances of WAKE and NREM within each REM sleep group count as disruptions of the continuity of REM sleep. The more the total number of these disruptions is, the more fragmented an individual's REM sleep is.

Examples:

This example counts as a REM sleep group because there are > 15 REM instances with < 15 instances of NREM or WAKE intervening:

REM/REM/REM/REM/REM/REM/REM/REM/REM/REM/REM/REM/REM/WAKE/WAKE/NREM/REM/REM/REM/REM/REM/REM

This example does not count as a REM sleep group because there are <15 REM instances:

REM/REM/REM/REM/REM/REM/REM/REM/REM/WAKE/WAKE/NREM/REM/REM/REM/NREM/NREM

This example does not count as a REM sleep group because there are > 15 instances of either contiguous NREM or WAKE intervening. If there are > 15 instances of NREM or WAKE intervening but they are not contiguous, then they do not invalidate the REM sleep group:

REM/REM/REM/REM/REM/REM/REM/REM/REM/REM/REM/REM/WAKE/WAKE/WAKE/WAKE/WAKE/WAKE/WAKE/WAKE/WAKE/WAKE/WAKE/NREM/NREM/NREM/NREM/REM/REM/REM/REM/REM/REM

This example counts as a REM sleep group EVEN THOUGH it is < 15 instances of REM in number because it is the first REM sleep encountered in the night:

REM/REM/REM/WAKE/WAKE/NREM/REM/REM/

I hope that this helps. Please let me know if you have any other questions.

Sincerely,

Ian

mkeintz · Posted 10-30-2019 10:35 AM

So a REM group must

Begin and end with a REM record.
Contain at least 15 REM records (except the first REM group which could have as few as two REM records
Contain no gap of non-REM records of size 15 or larger

The basic strategy is that you are using gaps of 15 or more to separate REM groups. But some stray REM records will be part of no REM group.

So what do you want the output to look like? It could be the same 1200 records you presented with a REM group ID variable added. Or it could be a data set (or report) just identifies, for each group, the starting record, end record, number of REM, number of NREM, and number of WAKE records.

Or perhaps you have something else in mind.

regards,

Mark

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

USCSS_Nostromo · Posted 10-30-2019 10:57 AM

Hi Mark,

Thank you for your reply. Yes, you're correct in your summary.

If possible, I would like a data similar to what you describe, which includes for each individual subject (I have 60 more records such as the one I sent as an example with a first column ID which identifies the individual subjects), the start and end record of each REM group, and the number of NREM and WAKE that intervene. I need to ultimately be able to quantify the number of intervening NREM and WAKE records for statistical analysis later per subject. The more information I could get from the data the better, e.g., which REM groups tend to have more intervening NREM and WAKE records, those at the beginning of the data set or at the end. If I could have a way of answering that question with a new data set then that would be excellent. If that complicates things then forget it. It's no problem. Please let me know if you have any questions if you wish to continue further assisting me.

Sincerely,

Ian

mkeintz · Posted 10-30-2019 05:23 PM

I think the logic of this program is to read the data in batches by ID TYPE, generating the size of each run of each TYPE, call it a RUNSIZE. Keep generating sequential RUNSIZE values (updating the total REM, NREM, and WAKE record counts at the same time), until you encounter a total of trailing runsize for NREM and WAKE greater the 15. (A "trailing" count is just the number of the records types since the end of the last run of REM records:

data have;
  infile 'c:\temp\t.txt' truncover dlm='	';  /* using download of your data here*/
  recid=_n_;
  input ID :$6.  TYPE :$4. ;
run;


data want (keep=id group_id rem_beg rem_end rem_beg_r rem_end_r N_rem n_nrem n_wake);

  label GROUP_ID   = 'Group number (within ID) for this REM group'
        REM_BEG    = 'Record ID for first REM record in this group'
        REM_END    = 'Record ID for last REM record in this group'
        REM_BEG_R  = 'Relative record id within an ID for REM_BEG'
        REM_END_R  = 'Relative record id within an ID for REM_END'
        REM_END_R  = 'Within ID value for REM_BEG'
        N_REM      = 'Number of REM records in this group'
        N_NREM     = 'Number of NREM records in this group'
        N_WAKE     = 'Number of WAKE records in this group'    ;

  retain END_OF_PRIOR_ID 0;                  /* Track the ending RID (see below) for preceding ID */
  /* Iterate below until end of ID or a gap (TRAILING_NREM + TRAILING_WAKE) is large enough */
  do until (sum(trailing_nrem,trailing_wake,0)>15 or last.id=1); 
    do runsize=1 by 1 until (last.type);     /* Find runsize for the current TYPE */
      set have;
      by id  type notsorted;
      rid+1;                                 /* Track Record identifier */
    end;

    if type='REM' then do;                           /* If this was a run of REM records ...   */
      rem_end=rid;
      if rem_beg=. then do;                          /* If REM group just starting ...         */
        rem_beg= rid - runsize + 1;                  /*   ... initial REM_BEG ...              */
        n_nrem=0;                                    /*   ... and N_NREM ...                   */
        n_wake=0;                                    /*   ... and N_WAKE ...                   */
      end;

      trailing_nrem=0;                               /* Reset trailing NREM run size           */
      trailing_wake=0;
      N_rem+runsize;                                 /* Update total count of REM records      */
    end;
    else if type='NREM' then do;
      trailing_nrem+runsize;                         /* Size of all trailing NREM runs */
      N_NREM+runsize;
    end;
    else if type='WAKE' then do;
      trailing_wake+runsize;                         /* Size of all trailing WAKE runs */
      N_WAKE+runsize;
    end;
  end; 

  /* Now that a large gap or end-of-id has been encountered ... */
  if N_rem>= ifn(group_id>=1,15,2) then do;  /* If this is a qualifying REM group ...          */
    N_nrem = N_nrem - trailing_nrem;         /* ... Subtract count of trailing NREM records    */
    N_wake = N_wake - trailing_wake;         /* ... Same with any trailing WAKE records        */
    REM_BEG_R= rem_beg-end_of_prior_id;      /* ... Get the relative REM_BEG ...               */
    REM_END_R= rem_end-end_of_prior_id;      /* ... Get the relative REM_END ...               */
    group_id+1;                              /* ... Update the group identifier                */
    output;
  end;
  call missing(of rem_:, of N_:, of trailing_:);
  if last.id then do;
     call missing(group_id);
     end_of_prior_id=rid;
  end;

run;

So the outer loop is a do until a large trailing gapsize (NREM and WAKE records) exceed 15 or end of an ID group.

The inside loop reads records in batches, by type.

Once the gap or end-of-id is encountered, just inspect the N_REM (>=15 except for the first group), and subtract the number of trailing nrem and wake records.

The expression

ifn(group_id>=1,15,2)

Returns a 15 when group_id>=1 and returns a 2 otherwise. This is a handy way to set different count requirements for the first rem group vs all the remaining rem groups. Since group_id is increment only AFTER qualification is established, this is a handy way to implement differing limits. Note at the end of each id, the group_id variable is reset to missing, so that the next ID is properly initialized.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

USCSS_Nostromo · Posted 11-04-2019 04:04 AM

Dear Mark,

Thanks again for your help with this. Your code works perfectly, even when I include additional data sets from other samples.

Sincerely,

Ian

Counting observations within groups of observations that fulfill specific criteria

Re: Counting observations within groups of observations that fulfill specific criteria

Re: Counting observations within groups of observations that fulfill specific criteria

Re: Counting observations within groups of observations that fulfill specific criteria

Re: Counting observations within groups of observations that fulfill specific criteria

Re: Counting observations within groups of observations that fulfill specific criteria

Re: Counting observations within groups of observations that fulfill specific criteria

Re: Counting observations within groups of observations that fulfill specific criteria

SAS Innovate 2025: Call for Content