BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Jagadishkatam
Amethyst | Level 16

Dear Experts,

I need to extract records with two consecutive <20 followed by >=20 res; in this case we get

ptid vis res
2 2 10
2 3 10
2 4 50
2 5 60
4 1 10
4 2 10
4 3 20
4 4 40
4 5 40

 

data is as below

 

data have;
input ptid vis res;
cards;
1 1 10 
1 2 20
1 3 20
1 4 50
2 1 20
2 2 10
2 3 10
2 4 50
2 5 60
3 1 10
3 2 20
3 3 20
3 4 50
4 1 10
4 2 10
4 3 20
4 4 40
4 5 40
;

could you please provide your inputs.

 

 

Thanks,
Jag
1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Good. Then please try this:

/* Select max. number of observations per patient */

proc sql noprint;
select max(c) into :maxobs
from (select count(*) as c
      from have
      group by ptid);
quit;

/* Select observations matching the pattern */

data want;
length resseq $&maxobs;
retain patternID;
if _n_=1 then patternID=prxparse('/1{2}?2{2,}/');
do until(last.ptid);
  set have;
  by ptid;
  resseq=cats(resseq,n(res)*(2-(res<20)));
end;
call prxsubstr(patternID, resseq, start, len);
do until(last.ptid);
  set have;
  by ptid;
  ctr=sum(ctr,1);
  if start<=ctr<start+len then output;
end;
drop patternID resseq start len ctr;
run;

In the first DOW loop a pattern of 1s and 2s and 0s representing observations with nonmissing RES<20, RES>=20 and missing RES, respectively, is written to character variable RESSEQ ("result sequence"). The Perl regular expression matches "exactly two 1s followed by two or more 2s" in RESSEQ. In the second DOW loop, observations are written to WANT if and only if their observation number within the BY group (CTR) is between the START and end (=START+LEN-1) position of the matched pattern (if any).

 

If there is a risk of two or more matched patterns for a patient, the code needs to be amended: Using the CALL PRXNEXT routine (instead of PRXSUBSTR) one could loop through the multiple matches.

 

Edit: Here is the enhanced version of the data step which allows for multiple matches for the same patient.

data want;
length resseq $&maxobs;
if _n_=1 then patternID+prxparse('/1{2}?2{2,}/');
do until(last.ptid);
  set have;
  by ptid;
  resseq=cats(resseq,n(res)*(2-(res<20)));
end;
start=1;
call prxnext(patternID, start, -1, resseq, pos, len);
do until(last.ptid);
  set have;
  by ptid;
  ctr=sum(ctr,1);
  if pos<=ctr<pos+len then output;
  if ~last.ptid & ctr=pos+len-1 then call prxnext(patternID, start, -1, resseq, pos, len);
end;
drop patternID resseq start pos len ctr;
run;

View solution in original post

7 REPLIES 7
Kurt_Bremser
Super User

Sort the dataset in reverse order and then use the lagX() functions for obtaining the data from previous observations. Then re-sort back into the original order.

FreelanceReinh
Jade | Level 19

Hi Jag,

 

If I had to program it, I would need more complete specifications:

  • What should be the first selected observation per PTID? The first of the two with RES<20 or the first of a (possibly longer) series of observations with RES<20?
  • What should be the last selected observation per PTID? The last of a series of observations with RES>=20 (following the two with RES<20) or the last observation of the patient?
  • What if (in cases with >=6 obs. for one PTID, if any) two blocks of observations satisfy the selection criterion (e.g. 10, 10, 20, 10, 10, 20)?
  • How would missing values of RES or skipped VIS numbers be handled?
  • What if repeated measurements (i.e. consecutive records with the same VIS) or unscheduled visits contributed to the target pattern?

 

Jagadishkatam
Amethyst | Level 16

@FreelanceReinhard,

 

Agree with your questions, here are my responses. Hope this helps.


    What should be the first selected observation per PTID? The first of the two with RES<20 or the first of a (possibly longer) series of observations with RES<20?
the firstshould be res < 20 and the first of the two with res<20

    What should be the last selected observation per PTID? The last of a series of observations with RES>=20 (following the two with RES<20) or the last observation of the patient?
the last selected observation should be res >=20. The last select observations should be res >=20.
    What if (in cases with >=6 obs. for one PTID, if any) two blocks of observations satisfy the selection criterion (e.g. 10, 10, 20, 10, 10, 20)?
no, this is not the selection criterion. We should pull only those recorsd which have first two consecutive res<20 followed by two consecutive res >=20
    How would missing values of RES or skipped VIS numbers be handled?
Missing records and vis will remain missing.
    What if repeated measurements (i.e. consecutive records with the same VIS) or unscheduled visits contributed to the target pattern?
we only require consecutive res<20 followed consecutive res>=20 and it could include records with same vis or even unscheduled visits or even missing vis.

Thanks,
Jag
FreelanceReinh
Jade | Level 19

Thanks for the clarification.

@Jagadishkatam wrote:

(...)

    What if (in cases with >=6 obs. for one PTID, if any) two blocks of observations satisfy the selection criterion (e.g. 10, 10, 20, 10, 10, 20)?
no, this is not the selection criterion. We should pull only those recorsd which have first two consecutive res<20 followed by two consecutive res >=20


Ok, then I guess there is little potential for two series within the same patient's data satisfying the criterion (like 10, 10, 20 ,20, 10, 10, 20, 20).

 


@Jagadishkatam wrote:

    How would missing values of RES or skipped VIS numbers be handled?
Missing records and vis will remain missing.

That is, the target pattern must not be interrupted by an observation with missing RES? Example: "10, 10, 41, ., 43, 44" would not be selected? 

Jagadishkatam
Amethyst | Level 16
Yes, i agree with you.
Thanks,
Jag
FreelanceReinh
Jade | Level 19

Good. Then please try this:

/* Select max. number of observations per patient */

proc sql noprint;
select max(c) into :maxobs
from (select count(*) as c
      from have
      group by ptid);
quit;

/* Select observations matching the pattern */

data want;
length resseq $&maxobs;
retain patternID;
if _n_=1 then patternID=prxparse('/1{2}?2{2,}/');
do until(last.ptid);
  set have;
  by ptid;
  resseq=cats(resseq,n(res)*(2-(res<20)));
end;
call prxsubstr(patternID, resseq, start, len);
do until(last.ptid);
  set have;
  by ptid;
  ctr=sum(ctr,1);
  if start<=ctr<start+len then output;
end;
drop patternID resseq start len ctr;
run;

In the first DOW loop a pattern of 1s and 2s and 0s representing observations with nonmissing RES<20, RES>=20 and missing RES, respectively, is written to character variable RESSEQ ("result sequence"). The Perl regular expression matches "exactly two 1s followed by two or more 2s" in RESSEQ. In the second DOW loop, observations are written to WANT if and only if their observation number within the BY group (CTR) is between the START and end (=START+LEN-1) position of the matched pattern (if any).

 

If there is a risk of two or more matched patterns for a patient, the code needs to be amended: Using the CALL PRXNEXT routine (instead of PRXSUBSTR) one could loop through the multiple matches.

 

Edit: Here is the enhanced version of the data step which allows for multiple matches for the same patient.

data want;
length resseq $&maxobs;
if _n_=1 then patternID+prxparse('/1{2}?2{2,}/');
do until(last.ptid);
  set have;
  by ptid;
  resseq=cats(resseq,n(res)*(2-(res<20)));
end;
start=1;
call prxnext(patternID, start, -1, resseq, pos, len);
do until(last.ptid);
  set have;
  by ptid;
  ctr=sum(ctr,1);
  if pos<=ctr<pos+len then output;
  if ~last.ptid & ctr=pos+len-1 then call prxnext(patternID, start, -1, resseq, pos, len);
end;
drop patternID resseq start pos len ctr;
run;
Jagadishkatam
Amethyst | Level 16
@FreelanceReinhard, indeed your solution is very good and helpful.
Thanks,
Jag

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1821 views
  • 0 likes
  • 3 in conversation