Data Manipulation

Accepted Solution Solved
Reply
New Contributor
Posts: 2
Accepted Solution

Data Manipulation

mamnipulatoinHello,

Could you please help me select the persons who had at least one visit in each 6 month period of the 24 month measure period, with

a munimum of 60 days between the first visit in the prior 6 month period and the last visit in the subsequent 6 month period? (The

2 years period is 3/1/2011-2/28/2013)

Here is the data:

    id                            Dates_of_visit
128-Mar-11
122-Apr-11
25-Apr-11
231-May-11
23-Jun-11
23-Jun-11
29-Jun-11
225-Aug-11
212-Jan-12
221-Feb-12
218-May-12
220-Jul-12
218-Sep-12
228-Nov-12
322-Mar-11
326-Apr-11
312-Jul-11
327-Oct-11
315-May-12
329-May-12
319-Jun-12
416-Mar-11
48-Apr-11
415-Apr-11
46-Jul-11
428-Sep-11
411-Nov-11
427-Jan-12
423-Apr-12
423-Jul-12
419-Oct-12
423-Oct-12
525-May-11
512-Jan-12
512-Apr-12
523-May-12
53-Jul-12
56-Aug-12
525-Oct-12
512-Nov-12
516-Nov-12
616-Mar-11
611-May-11
617-Aug-11
731-Mar-11
73-May-11
724-Jun-11
725-Aug-11
725-Aug-11
722-Sep-11
710-Nov-11
78-Dec-11
79-Jan-12
716-Feb-12
730-Mar-12
710-May-12
77-Jun-12
712-Jul-12
79-Aug-12
727-Sep-12
715-Nov-12
76-Dec-12
76-Dec-12
83-Mar-11
824-Mar-11
824-Mar-11
824-May-11
83-Jun-11
815-Jun-11
815-Jun-11
828-Jun-11
86-Jul-11
815-Jul-11
829-Jul-11
84-Aug-11
818-Oct-11
83-Nov-11
819-Dec-11
83-Jan-12
827-Jan-12
88-Mar-11
824-Mar-11
824-Mar-11
828-Apr-11
828-Apr-11
826-Jul-11
89-Aug-11
811-May-11
821-Mar-12
816-Jul-12
810-Sep-12
813-Dec-12
910-Jun-11
95-Oct-11
1023-Mar-11
106-Apr-11
1013-Apr-11
101-Jun-11
1024-Jan-12
108-Mar-12
105-Apr-12
1013-Sep-12
1019-Nov-12
1028-Nov-12
1114-Mar-11
1225-Apr-11
1223-May-11
125-Aug-11
129-Dec-11
1213-Jan-12
122-Mar-12
126-Mar-12
1330-Mar-11
1314-Apr-11
1315-Jun-11
1325-Aug-11
1327-Oct-11
1323-Dec-11
1327-Mar-12
137-Aug-12
1317-Sep-12
1326-Nov-12
1430-Mar-11
1416-May-11
1424-Feb-12
1425-May-12
145-Jul-12
1428-Aug-12
143-Oct-12
1418-Dec-12
1517-Jan-12
1515-Mar-12
1530-Aug-12
1530-Aug-12
1528-Jan-13
1621-Mar-11
1623-May-11
1629-Aug-11
1628-Nov-11
1629-Jun-12
1622-Oct-12
1612-Nov-12
1711-May-11
1725-May-11
1714-Jun-11
176-Jul-11
179-Aug-11
1720-Jun-12
1720-Jun-12
176-Aug-12
1721-Aug-12
1718-Sep-12
1723-Nov-12
1730-Nov-12
187-Apr-11
1818-Apr-11
1818-Apr-11
183-May-11
186-May-11
182-Jun-11
1812-Dec-11
1914-Mar-11
1927-Jun-11
1911-Jul-11
1930-Jan-12
1921-Feb-12
1921-May-12
1911-Jun-12
1919-Jul-12
1911-Oct-12
1925-Oct-12
196-Dec-12
196-Dec-12
2013-Sep-11
2025-Oct-11
2017-Nov-11
2131-Jul-12
2124-Aug-12
2126-Dec-12
2130-Jan-13
2125-Sep-12
228-Mar-11
225-Apr-11
223-May-11
2231-May-11
227-Jun-11
2228-Jun-11
2228-Jun-11
2226-Jul-11
2226-Jul-11
2223-Aug-11
2223-Aug-11
2220-Sep-11
2218-Oct-11
2215-Nov-11
2213-Dec-11
2210-Jan-12
227-Feb-12
226-Mar-12
2217-Apr-12
2215-May-12
2212-Jun-12
2210-Jul-12
227-Aug-12
224-Sep-12
222-Oct-12
2230-Oct-12
2230-Oct-12
2227-Nov-12
2227-Nov-12
2227-Dec-12
2331-May-11
2328-Jun-11
231-Nov-11
2315-Nov-11
2315-Nov-11
2314-Feb-12
2315-May-12
2312-Jun-12
2331-Jul-12
2328-Aug-12
2325-Sep-12
2312-Feb-13
2416-Mar-11
2416-Mar-11
249-Jun-11
248-Jul-11
2424-Aug-11
247-Sep-11
249-Sep-11
2419-Oct-11
2421-Nov-11
2421-Dec-11
242-Feb-12
242-Apr-12
242-May-12
242-Jul-12
242-Aug-12
242-Oct-12
241-Nov-12
2429-Jan-13
252-Jun-11
252-Jun-11
2512-Jul-11
256-Oct-11
2511-Oct-11
268-Apr-11
2611-Jul-11
2613-Feb-12
269-Jul-12
267-Sep-12
2716-Mar-11
2813-Sep-11
2825-Oct-11
2817-Nov-11
2914-Jul-11
293-Nov-11
2913-Mar-12
2912-Jun-12
294-Oct-12
2927-Dec-12
3029-Mar-11
3030-Aug-11
304-Oct-11
308-Nov-11
306-Dec-11
306-Dec-11
303-Jan-12
303-Jan-12
3031-Jan-12
308-Mar-12
3017-Apr-12
3015-May-12
3017-Jul-12
3031-Jul-12
3025-Sep-12
309-Oct-12
308-Jan-13
305-Feb-13

Thank you so much!


Accepted Solutions
Solution
‎05-22-2014 11:19 AM
Super Contributor
Posts: 644

Re: Data Manipulation

OK - misread the requirement.  Struggling to understand the context ( you want treatments that span at least 2 months but the treatment cycle must cross the (arbitrary) 3 month boundaries??)  Well, if you insist

/* create indicator for 6 monthly periods  */

Data periods ;

  Set have ;

  By Id Dates_of_Visit ;

  Period = intck(‘semiyear.3’, ‘01mar2011’d, dates_of_visit) + 1 ;

Run ;

/* Screen for inter-period durations */

/* init is the first visit in the current period, start is the first visit in the previous period */

Data screen ;

  Set Periods ;

  Retain init start ;

  By Id Period Dates_of_Visit ;

  If first.Period then

          init = Dates_of_Visit ;

If last.Period then

     do ;

          duration = Dates_of_Visit - start ;      /*     Values should be missing for the first period */

          if duration >= 60 then output ;

          start = init ;                                             /*     Set start ready for the next period */

     end ;

Run ;

/* Select qualifying Ids */

Proc SQL ;

  Create table want as

  Select Id

  From Screen

  Having (Count(*) = 3)

  Group By Id

  ;

Quit ;

[NB untested]

Richard

View solution in original post


All Replies
Regular Contributor
Posts: 217

Re: Data Manipulation

Do you wish to use a BASE SAS datastep; SQL; or some other style?

Essentially, you have to define the 6 month periods; define which rows fall into which six month period; then define the 60th day from the dates_of_visit for each row.  Once you know the grouping and 60 day look ahead date for each row; then you compare each row to every other row for that ID.  Ask the questions "does this ID for this row have another row within this 6 month period?  If so, is the 60 day look ahead date less than or equal to the dates_of_visit for the row in the 6 month period. 

Super Contributor
Posts: 644

Re: Data Manipulation

Try this

/* create indicator2 for 6 monthly periods and gaps between visits */

Data periods ;

  Set have ;

  By Id Dates_of_Visit ;

  Period = intck(‘semiyear.3’, ‘01mar2011’d, dates_of_visit) + 1 ;

  Gap = Diff (Dates_of_Visit) ;

  If First.id then call missing (Gap) ;

Run ;

/* Screen for inter-period gaps */

Data screen ;

  Set Periods ;

  By Id Period Dates_of_Visit ;

  If First.Id

  Or (First.Period and Gap >= 60)

  ;

Run ;

/* Select qualifying Ids */

Proc SQL ;

  Create table want as

  Select Id

  From Screen

  Having (Count(*) = 4)

  Group By Id

  ;

Quit ;

[NB untested]

If you require the relevant dates against the Id then merge back to the screen dataset

Richard

New Contributor
Posts: 2

Re: Data Manipulation

Thank you Richard!

The gap defination is different from mine.

The gap should be 60 days or more between the first visit in the prior 6 month period and the last visit in the subsequent 6 month period.

Solution
‎05-22-2014 11:19 AM
Super Contributor
Posts: 644

Re: Data Manipulation

OK - misread the requirement.  Struggling to understand the context ( you want treatments that span at least 2 months but the treatment cycle must cross the (arbitrary) 3 month boundaries??)  Well, if you insist

/* create indicator for 6 monthly periods  */

Data periods ;

  Set have ;

  By Id Dates_of_Visit ;

  Period = intck(‘semiyear.3’, ‘01mar2011’d, dates_of_visit) + 1 ;

Run ;

/* Screen for inter-period durations */

/* init is the first visit in the current period, start is the first visit in the previous period */

Data screen ;

  Set Periods ;

  Retain init start ;

  By Id Period Dates_of_Visit ;

  If first.Period then

          init = Dates_of_Visit ;

If last.Period then

     do ;

          duration = Dates_of_Visit - start ;      /*     Values should be missing for the first period */

          if duration >= 60 then output ;

          start = init ;                                             /*     Set start ready for the next period */

     end ;

Run ;

/* Select qualifying Ids */

Proc SQL ;

  Create table want as

  Select Id

  From Screen

  Having (Count(*) = 3)

  Group By Id

  ;

Quit ;

[NB untested]

Richard

Super User
Posts: 9,671

Re: Data Manipulation

data have;
 set have;
 select;
  when (Dates_of_visit lt intnx('month','01mar2011'd,6)) period=1;
  when (Dates_of_visit lt intnx('month','01mar2011'd,12)) period=2;
  when (Dates_of_visit lt intnx('month','01mar2011'd,18)) period=3;
  when (Dates_of_visit lt intnx('month','01mar2011'd,24)) period=4;
  otherwise;
 end;
run;
proc sql;
 create table temp as
  select * from have
   group by id
    having count(distinct period)=4 
      order by id,period,Dates_of_visit ;
quit;
data temp1;
 set temp;
 by  id period;
if first.period ;
period=period+1;
run;
data want;
 merge temp temp1(rename=(Dates_of_visit=_Dates_of_visit));
 by id period;
 if last.period then do;
  if (Dates_of_visit-_Dates_of_visit) ge 60 then output;
 end;
run;
proc sort data=want(keep=id) out=final nodupkey;by id;run;

Xia Keshan

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 299 views
  • 6 likes
  • 4 in conversation