BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
USCSS_Nostromo
Calcite | Level 5

Dear SAS communities,

 

I am interested in counting observations within groups of observations in sets of human sleep data that fulfill specific criteria.

 

In the course of a night's sleep, an individual goes through various stages of sleep, which can be categorized as rapid eye movement (REM) sleep or non-rapid eye movement sleep (NREM) or occasionally wakefulness (WAKE). These stages typically occur in groups. I wish to count the number of instances (observations) of NREM sleep and WAKE which occur during groups of REM sleep only. If I could get a column with this information then that would be fine.

 

One exception: Should there be more than 15 observations of either NREM or WAKE within a group of REM sleep, then this would not be counted.

 

Can you help me?

 

The data set is attached.

 

I have two columns in the data set: ID and Type. The data (REM, NREM and WAKE) are in the Type column.

 

Please feel free to ask me anything.

 

I am running SAS 9.4

 

Sincerely,

 

Ian

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

I think the logic of this program is to read the data in batches by ID TYPE, generating the size of each run of each TYPE, call it a RUNSIZE.    Keep generating sequential RUNSIZE values (updating the total REM, NREM, and WAKE record counts at the same time), until you encounter a total of trailing runsize for NREM and WAKE greater the 15.  (A "trailing" count is just the number of the records types since the end of the last run of REM records:

 

data have;
  infile 'c:\temp\t.txt' truncover dlm='	';  /* using download of your data here*/
  recid=_n_;
  input ID :$6.  TYPE :$4. ;
run;


data want (keep=id group_id rem_beg rem_end rem_beg_r rem_end_r N_rem n_nrem n_wake);

  label GROUP_ID   = 'Group number (within ID) for this REM group'
        REM_BEG    = 'Record ID for first REM record in this group'
        REM_END    = 'Record ID for last REM record in this group'
        REM_BEG_R  = 'Relative record id within an ID for REM_BEG'
        REM_END_R  = 'Relative record id within an ID for REM_END'
        REM_END_R  = 'Within ID value for REM_BEG'
        N_REM      = 'Number of REM records in this group'
        N_NREM     = 'Number of NREM records in this group'
        N_WAKE     = 'Number of WAKE records in this group'    ;

  retain END_OF_PRIOR_ID 0;                  /* Track the ending RID (see below) for preceding ID */
  /* Iterate below until end of ID or a gap (TRAILING_NREM + TRAILING_WAKE) is large enough */
  do until (sum(trailing_nrem,trailing_wake,0)>15 or last.id=1); 
    do runsize=1 by 1 until (last.type);     /* Find runsize for the current TYPE */
      set have;
      by id  type notsorted;
      rid+1;                                 /* Track Record identifier */
    end;

    if type='REM' then do;                           /* If this was a run of REM records ...   */
      rem_end=rid;
      if rem_beg=. then do;                          /* If REM group just starting ...         */
        rem_beg= rid - runsize + 1;                  /*   ... initial REM_BEG ...              */
        n_nrem=0;                                    /*   ... and N_NREM ...                   */
        n_wake=0;                                    /*   ... and N_WAKE ...                   */
      end;

      trailing_nrem=0;                               /* Reset trailing NREM run size           */
      trailing_wake=0;
      N_rem+runsize;                                 /* Update total count of REM records      */
    end;
    else if type='NREM' then do;
      trailing_nrem+runsize;                         /* Size of all trailing NREM runs */
      N_NREM+runsize;
    end;
    else if type='WAKE' then do;
      trailing_wake+runsize;                         /* Size of all trailing WAKE runs */
      N_WAKE+runsize;
    end;
  end; 

  /* Now that a large gap or end-of-id has been encountered ... */
  if N_rem>= ifn(group_id>=1,15,2) then do;  /* If this is a qualifying REM group ...          */
    N_nrem = N_nrem - trailing_nrem;         /* ... Subtract count of trailing NREM records    */
    N_wake = N_wake - trailing_wake;         /* ... Same with any trailing WAKE records        */
    REM_BEG_R= rem_beg-end_of_prior_id;      /* ... Get the relative REM_BEG ...               */
    REM_END_R= rem_end-end_of_prior_id;      /* ... Get the relative REM_END ...               */
    group_id+1;                              /* ... Update the group identifier                */
    output;
  end;
  call missing(of rem_:, of N_:, of trailing_:);
  if last.id then do;
     call missing(group_id);
     end_of_prior_id=rid;
  end;

run;

So the outer loop is a do until a large trailing gapsize (NREM and WAKE records) exceed 15 or end of an ID group.

The inside loop reads records in batches, by type.

 

Once the gap or end-of-id is encountered, just inspect the N_REM (>=15 except for the first group), and subtract the number of trailing nrem and wake records.

 

The expression 

ifn(group_id>=1,15,2)

Returns a 15 when group_id>=1  and returns a 2 otherwise.  This is a handy way to set different count requirements for the first rem group vs all the remaining rem groups.   Since group_id is increment only AFTER qualification is established, this is a handy way to implement differing limits.  Note at the end of each id, the group_id variable is reset to missing, so that the next ID is properly initialized.

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

6 REPLIES 6
mkeintz
PROC Star

@USCSS_Nostromo wrote:

Dear SAS communities,

 

I am interested in counting observations within groups of observations in sets of human sleep data that fulfill specific criteria.

 

In the course of a night's sleep, an individual goes through various stages of sleep, which can be categorized as rapid eye movement (REM) sleep or non-rapid eye movement sleep (NREM) or occasionally wakefulness (WAKE). These stages typically occur in groups. I wish to count the number of instances (observations) of NREM sleep and WAKE which occur during groups of REM sleep only. If I could get a column with this information then that would be fine.

 

One exception: Should there be more than 15 observations of either NREM or WAKE within a group of REM sleep, then this would not be counted.

 

 

I have two columns in the data set: ID and Type. The data (REM, NREM and WAKE) are in the Type column.

 

What is your definition of "groups of REM sleep"?    Does the sequence of

     REM/WAKE/WAKE/WAKE/WAKE/….(20 WAKEs)/REM

constitute a group of REM sleep?

 

Is yes, then does REM/WAKE/WAKE/REM/WAKE/WAKE/WAKE/WAKE/WAKE/REM  constitute one REM sleep group or two?

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
USCSS_Nostromo
Calcite | Level 5

Hello mkeintz,

 

Thank you for your reply.

 

The definition of a REM sleep group is a sequence of REM that is > 15 instances or more or REM. There can be intervening NREM or Wake without ending the REM sleep group as long as the contiguous intervening NREM or WAKE together are < 15 in number.

 

There is one exception to the above rules, however: the first group of REM sleep in the data can be any length of REM instances. It does not have to be > 15 in number. Here are some examples.

 

There are per data set that I have 4-5 REM sleep groups. The overarching aim is to see how fragmented a given individual's REM sleep is per night. The instances of WAKE and NREM within each REM sleep group count as disruptions of the continuity of REM sleep. The more the total number of these disruptions is, the more fragmented an individual's REM sleep is.

 

Examples:

 

This example counts as a REM sleep group because there are > 15 REM instances with < 15 instances of NREM or WAKE intervening:

 

REM/REM/REM/REM/REM/REM/REM/REM/REM/REM/REM/REM/REM/WAKE/WAKE/NREM/REM/REM/REM/REM/REM/REM

 

This example does not count as a REM sleep group because there are <15 REM instances:

 

REM/REM/REM/REM/REM/REM/REM/REM/REM/WAKE/WAKE/NREM/REM/REM/REM/NREM/NREM

 

This example does not count as a REM sleep group because there are > 15 instances of either contiguous NREM or WAKE intervening. If there are > 15 instances of NREM or WAKE intervening but they are not contiguous, then they do not invalidate the REM sleep group:

 

REM/REM/REM/REM/REM/REM/REM/REM/REM/REM/REM/REM/WAKE/WAKE/WAKE/WAKE/WAKE/WAKE/WAKE/WAKE/WAKE/WAKE/WAKE/NREM/NREM/NREM/NREM/REM/REM/REM/REM/REM/REM

 

This example counts as a REM sleep group EVEN THOUGH it is < 15 instances of REM in number because it is the first REM sleep encountered in the night:

 

REM/REM/REM/WAKE/WAKE/NREM/REM/REM/

 

I hope that this helps. Please let me know if you have any other questions.

 

Sincerely,

 

Ian

mkeintz
PROC Star

So a REM group must 

  1.  Begin and end with a REM record.
  2.  Contain at least 15 REM records (except the first REM group which could have as few as two REM records
  3.  Contain no gap of non-REM records of size 15 or larger

 

The basic strategy is that you are using gaps of 15 or more to separate REM groups. But some stray REM records will be part of no REM group.

 

So what do you want the output to look like?  It could be the same 1200 records you presented with a REM group ID variable added. Or it could be a data set (or report) just identifies, for each group, the starting record, end record, number of REM, number of NREM, and number of WAKE records. 

 

Or perhaps you have something else in mind.

 

regards,

Mark

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
USCSS_Nostromo
Calcite | Level 5

Hi Mark,

 

Thank you for your reply. Yes, you're correct in your summary.

 

If possible, I would like a data similar to what you describe, which includes for each individual subject (I have 60 more records such as the one I sent as an example with a first column ID which identifies the individual subjects), the start and end record of each REM group, and the number of NREM and WAKE that intervene. I need to ultimately be able to quantify the number of intervening NREM and WAKE records for statistical analysis later per subject. The more information I could get from the data the better, e.g., which REM groups tend to have more intervening NREM and WAKE records, those at the beginning of the data set or at the end. If I could have a way of answering that question with a new data set then that would be excellent. If that complicates things then forget it. It's no problem. Please let me know if you have any questions if you wish to continue further assisting me.

 

Sincerely,

 

Ian

mkeintz
PROC Star

I think the logic of this program is to read the data in batches by ID TYPE, generating the size of each run of each TYPE, call it a RUNSIZE.    Keep generating sequential RUNSIZE values (updating the total REM, NREM, and WAKE record counts at the same time), until you encounter a total of trailing runsize for NREM and WAKE greater the 15.  (A "trailing" count is just the number of the records types since the end of the last run of REM records:

 

data have;
  infile 'c:\temp\t.txt' truncover dlm='	';  /* using download of your data here*/
  recid=_n_;
  input ID :$6.  TYPE :$4. ;
run;


data want (keep=id group_id rem_beg rem_end rem_beg_r rem_end_r N_rem n_nrem n_wake);

  label GROUP_ID   = 'Group number (within ID) for this REM group'
        REM_BEG    = 'Record ID for first REM record in this group'
        REM_END    = 'Record ID for last REM record in this group'
        REM_BEG_R  = 'Relative record id within an ID for REM_BEG'
        REM_END_R  = 'Relative record id within an ID for REM_END'
        REM_END_R  = 'Within ID value for REM_BEG'
        N_REM      = 'Number of REM records in this group'
        N_NREM     = 'Number of NREM records in this group'
        N_WAKE     = 'Number of WAKE records in this group'    ;

  retain END_OF_PRIOR_ID 0;                  /* Track the ending RID (see below) for preceding ID */
  /* Iterate below until end of ID or a gap (TRAILING_NREM + TRAILING_WAKE) is large enough */
  do until (sum(trailing_nrem,trailing_wake,0)>15 or last.id=1); 
    do runsize=1 by 1 until (last.type);     /* Find runsize for the current TYPE */
      set have;
      by id  type notsorted;
      rid+1;                                 /* Track Record identifier */
    end;

    if type='REM' then do;                           /* If this was a run of REM records ...   */
      rem_end=rid;
      if rem_beg=. then do;                          /* If REM group just starting ...         */
        rem_beg= rid - runsize + 1;                  /*   ... initial REM_BEG ...              */
        n_nrem=0;                                    /*   ... and N_NREM ...                   */
        n_wake=0;                                    /*   ... and N_WAKE ...                   */
      end;

      trailing_nrem=0;                               /* Reset trailing NREM run size           */
      trailing_wake=0;
      N_rem+runsize;                                 /* Update total count of REM records      */
    end;
    else if type='NREM' then do;
      trailing_nrem+runsize;                         /* Size of all trailing NREM runs */
      N_NREM+runsize;
    end;
    else if type='WAKE' then do;
      trailing_wake+runsize;                         /* Size of all trailing WAKE runs */
      N_WAKE+runsize;
    end;
  end; 

  /* Now that a large gap or end-of-id has been encountered ... */
  if N_rem>= ifn(group_id>=1,15,2) then do;  /* If this is a qualifying REM group ...          */
    N_nrem = N_nrem - trailing_nrem;         /* ... Subtract count of trailing NREM records    */
    N_wake = N_wake - trailing_wake;         /* ... Same with any trailing WAKE records        */
    REM_BEG_R= rem_beg-end_of_prior_id;      /* ... Get the relative REM_BEG ...               */
    REM_END_R= rem_end-end_of_prior_id;      /* ... Get the relative REM_END ...               */
    group_id+1;                              /* ... Update the group identifier                */
    output;
  end;
  call missing(of rem_:, of N_:, of trailing_:);
  if last.id then do;
     call missing(group_id);
     end_of_prior_id=rid;
  end;

run;

So the outer loop is a do until a large trailing gapsize (NREM and WAKE records) exceed 15 or end of an ID group.

The inside loop reads records in batches, by type.

 

Once the gap or end-of-id is encountered, just inspect the N_REM (>=15 except for the first group), and subtract the number of trailing nrem and wake records.

 

The expression 

ifn(group_id>=1,15,2)

Returns a 15 when group_id>=1  and returns a 2 otherwise.  This is a handy way to set different count requirements for the first rem group vs all the remaining rem groups.   Since group_id is increment only AFTER qualification is established, this is a handy way to implement differing limits.  Note at the end of each id, the group_id variable is reset to missing, so that the next ID is properly initialized.

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
USCSS_Nostromo
Calcite | Level 5

Dear Mark,

 

Thanks again for your help with this. Your code works perfectly, even when I include additional data sets from other samples.

 

Sincerely,

 

Ian

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 1151 views
  • 0 likes
  • 2 in conversation