turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- Base SAS Programming
- /
- consecutive records filter

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

04-11-2016 02:59 AM

Dear Experts,

I need to extract records with two consecutive <20 followed by >=20 res; in this case we get

ptid vis res

2 2 10

2 3 10

2 4 50

2 5 60

4 1 10

4 2 10

4 3 20

4 4 40

4 5 40

data is as below

```
data have;
input ptid vis res;
cards;
1 1 10
1 2 20
1 3 20
1 4 50
2 1 20
2 2 10
2 3 10
2 4 50
2 5 60
3 1 10
3 2 20
3 3 20
3 4 50
4 1 10
4 2 10
4 3 20
4 4 40
4 5 40
;
```

could you please provide your inputs.

Thanks,

Jag

Jag

Accepted Solutions

Solution

04-11-2016
10:52 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Jagadishkatam

04-11-2016 07:40 AM - edited 04-11-2016 08:20 AM

Good. Then please try this:

```
/* Select max. number of observations per patient */
proc sql noprint;
select max(c) into :maxobs
from (select count(*) as c
from have
group by ptid);
quit;
/* Select observations matching the pattern */
data want;
length resseq $&maxobs;
retain patternID;
if _n_=1 then patternID=prxparse('/1{2}?2{2,}/');
do until(last.ptid);
set have;
by ptid;
resseq=cats(resseq,n(res)*(2-(res<20)));
end;
call prxsubstr(patternID, resseq, start, len);
do until(last.ptid);
set have;
by ptid;
ctr=sum(ctr,1);
if start<=ctr<start+len then output;
end;
drop patternID resseq start len ctr;
run;
```

In the first DOW loop a pattern of 1s and 2s and 0s representing observations with nonmissing RES<20, RES>=20 and missing RES, respectively, is written to character variable RESSEQ ("result sequence"). The Perl regular expression matches "exactly two 1s followed by two or more 2s" in RESSEQ. In the second DOW loop, observations are written to WANT if and only if their observation number within the BY group (CTR) is between the START and end (=START+LEN-1) position of the matched pattern (if any).

If there is a risk of two or more matched patterns for a patient, the code needs to be amended: Using the CALL PRXNEXT routine (instead of PRXSUBSTR) one could loop through the multiple matches.

Edit: Here is the enhanced version of the data step which allows for multiple matches for the same patient.

```
data want;
length resseq $&maxobs;
if _n_=1 then patternID+prxparse('/1{2}?2{2,}/');
do until(last.ptid);
set have;
by ptid;
resseq=cats(resseq,n(res)*(2-(res<20)));
end;
start=1;
call prxnext(patternID, start, -1, resseq, pos, len);
do until(last.ptid);
set have;
by ptid;
ctr=sum(ctr,1);
if pos<=ctr<pos+len then output;
if ~last.ptid & ctr=pos+len-1 then call prxnext(patternID, start, -1, resseq, pos, len);
end;
drop patternID resseq start pos len ctr;
run;
```

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Jagadishkatam

04-11-2016 03:25 AM

Sort the dataset in reverse order and then use the lagX() functions for obtaining the data from previous observations. Then re-sort back into the original order.

---------------------------------------------------------------------------------------------

Maxims of Maximally Efficient SAS Programmers

Maxims of Maximally Efficient SAS Programmers

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Jagadishkatam

04-11-2016 04:38 AM

Hi Jag,

If I had to program it, I would need more complete specifications:

- What should be the first selected observation per PTID? The first of the two with RES<20 or the first of a (possibly longer) series of observations with RES<20?
- What should be the last selected observation per PTID? The last of a series of observations with RES>=20 (following the two with RES<20) or the last observation of the patient?
- What if (in cases with >=6 obs. for one PTID, if any) two blocks of observations satisfy the selection criterion (e.g. 10, 10, 20, 10, 10, 20)?
- How would missing values of RES or skipped VIS numbers be handled?
- What if repeated measurements (i.e. consecutive records with the same VIS) or unscheduled visits contributed to the target pattern?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to FreelanceReinhard

04-11-2016 05:22 AM

@FreelanceReinhard,

Agree with your questions, here are my responses. Hope this helps.

What should be the first selected observation per PTID? The first of the two with RES<20 or the first of a (possibly longer) series of observations with RES<20?**the firstshould be res < 20 and the first of the two with res<20**

What should be the last selected observation per PTID? The last of a series of observations with RES>=20 (following the two with RES<20) or the last observation of the patient?**the last selected observation should be res >=20. The last select observations should be res >=20.**

What if (in cases with >=6 obs. for one PTID, if any) two blocks of observations satisfy the selection criterion (e.g. 10, 10, 20, 10, 10, 20)?**no, this is not the selection criterion. We should pull only those recorsd which have first two consecutive res<20 followed by two consecutive res >=20**

How would missing values of RES or skipped VIS numbers be handled?**Missing records and vis will remain missing**.

What if repeated measurements (i.e. consecutive records with the same VIS) or unscheduled visits contributed to the target pattern?**we only require consecutive res<20 followed consecutive res>=20 and it could include records with same vis or even unscheduled visits or even missing vis.**

Thanks,

Jag

Jag

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Jagadishkatam

04-11-2016 06:14 AM

Thanks for the clarification.

Jagadishkatam wrote:

(...)

What if (in cases with >=6 obs. for one PTID, if any) two blocks of observations satisfy the selection criterion (e.g. 10, 10, 20, 10, 10, 20)?

no, this is not the selection criterion. We should pull only those recorsd which have first two consecutive res<20 followed by two consecutive res >=20

Ok, then I guess there is little potential for two series within the same patient's data satisfying the criterion (like 10, 10, 20 ,20, 10, 10, 20, 20).

Jagadishkatam wrote:

How would missing values of RES or skipped VIS numbers be handled?Missing records and vis will remain missing.

That is, the target pattern must not be interrupted by an observation with missing RES? Example: "10, 10, 41, **.**, 43, 44" would not be selected?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to FreelanceReinhard

04-11-2016 06:51 AM

Yes, i agree with you.

Thanks,

Jag

Jag

Solution

04-11-2016
10:52 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Jagadishkatam

04-11-2016 07:40 AM - edited 04-11-2016 08:20 AM

Good. Then please try this:

```
/* Select max. number of observations per patient */
proc sql noprint;
select max(c) into :maxobs
from (select count(*) as c
from have
group by ptid);
quit;
/* Select observations matching the pattern */
data want;
length resseq $&maxobs;
retain patternID;
if _n_=1 then patternID=prxparse('/1{2}?2{2,}/');
do until(last.ptid);
set have;
by ptid;
resseq=cats(resseq,n(res)*(2-(res<20)));
end;
call prxsubstr(patternID, resseq, start, len);
do until(last.ptid);
set have;
by ptid;
ctr=sum(ctr,1);
if start<=ctr<start+len then output;
end;
drop patternID resseq start len ctr;
run;
```

In the first DOW loop a pattern of 1s and 2s and 0s representing observations with nonmissing RES<20, RES>=20 and missing RES, respectively, is written to character variable RESSEQ ("result sequence"). The Perl regular expression matches "exactly two 1s followed by two or more 2s" in RESSEQ. In the second DOW loop, observations are written to WANT if and only if their observation number within the BY group (CTR) is between the START and end (=START+LEN-1) position of the matched pattern (if any).

If there is a risk of two or more matched patterns for a patient, the code needs to be amended: Using the CALL PRXNEXT routine (instead of PRXSUBSTR) one could loop through the multiple matches.

Edit: Here is the enhanced version of the data step which allows for multiple matches for the same patient.

```
data want;
length resseq $&maxobs;
if _n_=1 then patternID+prxparse('/1{2}?2{2,}/');
do until(last.ptid);
set have;
by ptid;
resseq=cats(resseq,n(res)*(2-(res<20)));
end;
start=1;
call prxnext(patternID, start, -1, resseq, pos, len);
do until(last.ptid);
set have;
by ptid;
ctr=sum(ctr,1);
if pos<=ctr<pos+len then output;
if ~last.ptid & ctr=pos+len-1 then call prxnext(patternID, start, -1, resseq, pos, len);
end;
drop patternID resseq start pos len ctr;
run;
```

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to FreelanceReinhard

04-11-2016 10:53 AM

@FreelanceReinhard, indeed your solution is very good and helpful.

Thanks,

Jag

Jag