Solved: Is there a code that will allow me to select first, second, and last d...

Kels123 · Posted 11-28-2016 01:32 AM

BACKGROUND INFO:

I have a database that shows periods of time (marked by StartDate and EndDate) during which study subjects were in a certain environment (SCHOOL vs. COMMUNITY). During these time periods, a lab test was done sporadically (sometimes multiple times but sometimes not at all) for which I have the dates (Lab_Date) and results (Lab_Result). The variable “count” keeps track of the fact that the same study subject can move back and forth between school and community multiple times (the first time period which is in the COMMUNITY in this case has count=1, the second time period SCHOOL has count=2, the third time period COMMUNITY has count=3, and then the next unique study subject starts at count=1 again).

Sample Data:

environment StartDate EndDate SubjectNo MvmtDate count daysbwn short_inc Lab_date Lab_Result

COMMUNIT 2007-01-01 2007-01-30 00001 01/31/2007 1 29 1 . 6720

SCHOOL 2007-01-31 2008-04-30 00001 05/01/2008 2 455 0 . 6720

SCHOOL 2007-01-31 2008-04-30 00001 05/01/2008 2 455 0 2007-02-22 400

SCHOOL 2007-01-31 2008-04-30 00001 05/01/2008 2 455 0 2007-05-22 48

SCHOOL 2007-01-31 2008-04-30 00001 05/01/2008 2 455 0 2007-08-14

SCHOOL 2007-01-31 2008-04-30 00001 05/01/2008 2 455 0 2007-12-18 48

SCHOOL 2007-01-31 2008-04-30 00001 05/01/2008 2 455 0 2008-03-18 47

COMMUNIT 2008-05-01 2015-01-01 00001 05/01/2008 3 2436 0 . 6720

COMMUNIT 2008-05-01 2015-01-01 00001 05/01/2008 3 2436 0 2008-06-11 3545

COMMUNIT 2008-05-01 2015-01-01 00001 05/01/2008 3 2436 0 2008-07-21 159

COMMUNIT 2008-05-01 2015-01-01 00001 05/01/2008 3 2436 0 2009-10-09 68400

COMMUNIT 2008-05-01 2015-01-01 00001 05/01/2008 3 2436 0 2009-11-16 1650

COMMUNIT 2008-05-01 2015-01-01 00001 05/01/2008 3 2436 0 2009-12-29 2530

COMMUNIT 2008-05-01 2015-01-01 00001 05/01/2008 3 2436 0 2010-02-05 7700

COMMUNIT 2008-05-01 2015-01-01 00001 05/01/2008 3 2436 0 2010-02-11 21154

COMMUNIT 2008-05-01 2015-01-01 00001 05/01/2008 3 2436 0 2010-04-22 47

COMMUNIT 2008-05-01 2015-01-01 00001 05/01/2008 3 2436 0 2010-04-23 54

COMMUNIT 2008-05-01 2015-01-01 00001 05/01/2008 3 2436 0 2010-08-11 47

COMMUNIT 2008-05-01 2015-01-01 00001 05/01/2008 3 2436 0 2010-11-15 47

COMMUNIT 2008-05-01 2015-01-01 00001 05/01/2008 3 2436 0 2011-02-16 20

COMMUNIT 2008-05-01 2015-01-01 00001 05/01/2008 3 2436 0 2011-05-17 65

COMMUNIT 2008-05-01 2015-01-01 00001 05/01/2008 3 2436 0 2011-10-04 32

COMMUNIT 2008-05-01 2015-01-01 00001 05/01/2008 3 2436 0 2012-01-19 34

COMMUNIT 2008-05-01 2015-01-01 00001 05/01/2008 3 2436 0 2012-03-31 72

COMMUNIT 2007-01-01 2012-05-31 00002 06/01/2012 1 1977 0 . .

COMMUNIT 2007-01-01 2012-05-31 00002 06/01/2012 1 1977 0 2007-03-08 .

COMMUNIT 2007-01-01 2012-05-31 00002 06/01/2012 1 1977 0 2007-07-05 466

COMMUNIT 2007-01-01 2012-05-31 00002 06/01/2012 1 1977 0 2007-08-17 410

MY QUESTION:

I want to simplify this database to the first and last lab_dates for each time period (dropping all others). The problem is that some periods having missing Lab_Dates, and I only want to drop these missing values in CERTAIN CIRCUMSTANCES. For example, for count=1 for the first subject, I want to keep that first row because the missing Lab_Date represents that a lab was never drawn during this time period (COMMUNITY) and I don't want to have to drop that whole time period (the lack of data is meaningful for analysis later on). However, for the second time period (SCHOOL; same subject, count=2), rows #2 and #3 tell you that while the Lab_Date is missing in row #2, this subject had multiple labs drawn during that time period that are captured by row #3 and beyond. So I want to keep the Lab_Date from row #3 and drop row #2 because it is unnecessary.

I know how to use first.StartDate to select the first row in each period:

data clean.firstlastvl;

set work.allvl;

by SubjectNo startdate;

if first.startdate;

run;

And I think I was able to modify the code to select the first AND last rows for each period:

data clean.firstlastvl;

set work.allvl;

by SubjectNo startdate;

if first.startdate or last.startdate;

run;

However, is there a way to select first, second, and last dates that fall into a certain time period? For example, I would like to make the sample dataset look like this (by dropping the rows marked in red):