Data Manipulation

Accepted Solution Solved
Reply
New Contributor
Posts: 2
Accepted Solution

Data Manipulation

mamnipulatoinHello,

Could you please help me select the persons who had at least one visit in each 6 month period of the 24 month measure period, with

a munimum of 60 days between the first visit in the prior 6 month period and the last visit in the subsequent 6 month period? (The

2 years period is 3/1/2011-2/28/2013)

Here is the data:

    id                            Dates_of_visit
128-Mar-11
122-Apr-11
25-Apr-11
231-May-11
23-Jun-11
23-Jun-11
29-Jun-11
225-Aug-11
212-Jan-12
221-Feb-12
218-May-12
220-Jul-12
218-Sep-12
228-Nov-12
322-Mar-11
326-Apr-11
312-Jul-11
327-Oct-11
315-May-12
329-May-12
319-Jun-12
416-Mar-11
48-Apr-11
415-Apr-11
46-Jul-11
428-Sep-11
411-Nov-11
427-Jan-12
423-Apr-12
423-Jul-12
419-Oct-12
423-Oct-12
525-May-11
512-Jan-12
512-Apr-12
523-May-12
53-Jul-12
56-Aug-12
525-Oct-12
512-Nov-12
516-Nov-12
616-Mar-11
611-May-11
617-Aug-11
731-Mar-11
73-May-11
724-Jun-11
725-Aug-11
725-Aug-11
722-Sep-11
710-Nov-11
78-Dec-11
79-Jan-12
716-Feb-12
730-Mar-12
710-May-12
77-Jun-12
712-Jul-12
79-Aug-12
727-Sep-12
715-Nov-12
76-Dec-12
76-Dec-12
83-Mar-11
824-Mar-11
824-Mar-11
824-May-11
83-Jun-11
815-Jun-11
815-Jun-11
828-Jun-11
86-Jul-11
815-Jul-11
829-Jul-11
84-Aug-11
818-Oct-11
83-Nov-11
819-Dec-11
83-Jan-12
827-Jan-12
88-Mar-11
824-Mar-11
824-Mar-11
828-Apr-11
828-Apr-11
826-Jul-11
89-Aug-11
811-May-11
821-Mar-12
816-Jul-12
810-Sep-12
813-Dec-12
910-Jun-11
95-Oct-11
1023-Mar-11
106-Apr-11
1013-Apr-11
101-Jun-11
1024-Jan-12
108-Mar-12
105-Apr-12
1013-Sep-12
1019-Nov-12
1028-Nov-12
1114-Mar-11
1225-Apr-11
1223-May-11
125-Aug-11
129-Dec-11
1213-Jan-12
122-Mar-12
126-Mar-12
1330-Mar-11
1314-Apr-11
1315-Jun-11
1325-Aug-11
1327-Oct-11
1323-Dec-11
1327-Mar-12
137-Aug-12
1317-Sep-12
1326-Nov-12
1430-Mar-11
1416-May-11
1424-Feb-12
1425-May-12
145-Jul-12
1428-Aug-12
143-Oct-12
1418-Dec-12
1517-Jan-12
1515-Mar-12
1530-Aug-12
1530-Aug-12
1528-Jan-13
1621-Mar-11
1623-May-11
1629-Aug-11
1628-Nov-11
1629-Jun-12
1622-Oct-12
1612-Nov-12
1711-May-11
1725-May-11
1714-Jun-11
176-Jul-11
179-Aug-11
1720-Jun-12
1720-Jun-12
176-Aug-12
1721-Aug-12
1718-Sep-12
1723-Nov-12
1730-Nov-12
187-Apr-11
1818-Apr-11
1818-Apr-11
183-May-11
186-May-11
182-Jun-11
1812-Dec-11
1914-Mar-11
1927-Jun-11
1911-Jul-11
1930-Jan-12
1921-Feb-12
1921-May-12
1911-Jun-12
1919-Jul-12
1911-Oct-12
1925-Oct-12
196-Dec-12
196-Dec-12
2013-Sep-11
2025-Oct-11
2017-Nov-11
2131-Jul-12
2124-Aug-12
2126-Dec-12
2130-Jan-13
2125-Sep-12
228-Mar-11
225-Apr-11
223-May-11
2231-May-11
227-Jun-11
2228-Jun-11
2228-Jun-11
2226-Jul-11
2226-Jul-11
2223-Aug-11
2223-Aug-11
2220-Sep-11
2218-Oct-11
2215-Nov-11
2213-Dec-11
2210-Jan-12
227-Feb-12
226-Mar-12
2217-Apr-12
2215-May-12
2212-Jun-12
2210-Jul-12
227-Aug-12
224-Sep-12
222-Oct-12
2230-Oct-12
2230-Oct-12
2227-Nov-12
2227-Nov-12
2227-Dec-12
2331-May-11
2328-Jun-11
231-Nov-11
2315-Nov-11
2315-Nov-11
2314-Feb-12
2315-May-12
2312-Jun-12
2331-Jul-12
2328-Aug-12
2325-Sep-12
2312-Feb-13
2416-Mar-11
2416-Mar-11
249-Jun-11
248-Jul-11
2424-Aug-11
247-Sep-11
249-Sep-11
2419-Oct-11
2421-Nov-11
2421-Dec-11
242-Feb-12
242-Apr-12
242-May-12
242-Jul-12
242-Aug-12
242-Oct-12
241-Nov-12
2429-Jan-13
252-Jun-11
252-Jun-11
2512-Jul-11
256-Oct-11
2511-Oct-11
268-Apr-11
2611-Jul-11
2613-Feb-12
269-Jul-12
267-Sep-12
2716-Mar-11
2813-Sep-11
2825-Oct-11
2817-Nov-11
2914-Jul-11
293-Nov-11
2913-Mar-12
2912-Jun-12
294-Oct-12
2927-Dec-12
3029-Mar-11
3030-Aug-11
304-Oct-11
308-Nov-11
306-Dec-11
306-Dec-11
303-Jan-12
303-Jan-12
3031-Jan-12
308-Mar-12
3017-Apr-12
3015-May-12
3017-Jul-12
3031-Jul-12
3025-Sep-12
309-Oct-12
308-Jan-13
305-Feb-13

Thank you so much!


Accepted Solutions
Solution
‎05-22-2014 11:19 AM
Super Contributor
Posts: 644

Re: Data Manipulation

OK - misread the requirement.  Struggling to understand the context ( you want treatments that span at least 2 months but the treatment cycle must cross the (arbitrary) 3 month boundaries??)  Well, if you insist

/* create indicator for 6 monthly periods  */

Data periods ;

  Set have ;

  By Id Dates_of_Visit ;

  Period = intck(‘semiyear.3’, ‘01mar2011’d, dates_of_visit) + 1 ;

Run ;

/* Screen for inter-period durations */

/* init is the first visit in the current period, start is the first visit in the previous period */

Data screen ;

  Set Periods ;

  Retain init start ;

  By Id Period Dates_of_Visit ;

  If first.Period then

          init = Dates_of_Visit ;

If last.Period then

     do ;

          duration = Dates_of_Visit - start ;      /*     Values should be missing for the first period */

          if duration >= 60 then output ;

          start = init ;                                             /*     Set start ready for the next period */

     end ;

Run ;

/* Select qualifying Ids */

Proc SQL ;

  Create table want as

  Select Id

  From Screen

  Having (Count(*) = 3)

  Group By Id

  ;

Quit ;

[NB untested]

Richard

View solution in original post


All Replies
Regular Contributor
Posts: 217

Re: Data Manipulation

Do you wish to use a BASE SAS datastep; SQL; or some other style?

Essentially, you have to define the 6 month periods; define which rows fall into which six month period; then define the 60th day from the dates_of_visit for each row.  Once you know the grouping and 60 day look ahead date for each row; then you compare each row to every other row for that ID.  Ask the questions "does this ID for this row have another row within this 6 month period?  If so, is the 60 day look ahead date less than or equal to the dates_of_visit for the row in the 6 month period. 

Super Contributor
Posts: 644

Re: Data Manipulation

Try this

/* create indicator2 for 6 monthly periods and gaps between visits */

Data periods ;

  Set have ;

  By Id Dates_of_Visit ;

  Period = intck(‘semiyear.3’, ‘01mar2011’d, dates_of_visit) + 1 ;

  Gap = Diff (Dates_of_Visit) ;

  If First.id then call missing (Gap) ;

Run ;

/* Screen for inter-period gaps */

Data screen ;

  Set Periods ;

  By Id Period Dates_of_Visit ;

  If First.Id

  Or (First.Period and Gap >= 60)

  ;

Run ;

/* Select qualifying Ids */

Proc SQL ;

  Create table want as

  Select Id

  From Screen

  Having (Count(*) = 4)

  Group By Id

  ;

Quit ;

[NB untested]

If you require the relevant dates against the Id then merge back to the screen dataset

Richard

New Contributor
Posts: 2

Re: Data Manipulation

Posted in reply to RichardinOz

Thank you Richard!

The gap defination is different from mine.

The gap should be 60 days or more between the first visit in the prior 6 month period and the last visit in the subsequent 6 month period.

Solution
‎05-22-2014 11:19 AM
Super Contributor
Posts: 644

Re: Data Manipulation

OK - misread the requirement.  Struggling to understand the context ( you want treatments that span at least 2 months but the treatment cycle must cross the (arbitrary) 3 month boundaries??)  Well, if you insist

/* create indicator for 6 monthly periods  */

Data periods ;

  Set have ;

  By Id Dates_of_Visit ;

  Period = intck(‘semiyear.3’, ‘01mar2011’d, dates_of_visit) + 1 ;

Run ;

/* Screen for inter-period durations */

/* init is the first visit in the current period, start is the first visit in the previous period */

Data screen ;

  Set Periods ;

  Retain init start ;

  By Id Period Dates_of_Visit ;

  If first.Period then

          init = Dates_of_Visit ;

If last.Period then

     do ;

          duration = Dates_of_Visit - start ;      /*     Values should be missing for the first period */

          if duration >= 60 then output ;

          start = init ;                                             /*     Set start ready for the next period */

     end ;

Run ;

/* Select qualifying Ids */

Proc SQL ;

  Create table want as

  Select Id

  From Screen

  Having (Count(*) = 3)

  Group By Id

  ;

Quit ;

[NB untested]

Richard

Super User
Posts: 10,028

Re: Data Manipulation

data have;
 set have;
 select;
  when (Dates_of_visit lt intnx('month','01mar2011'd,6)) period=1;
  when (Dates_of_visit lt intnx('month','01mar2011'd,12)) period=2;
  when (Dates_of_visit lt intnx('month','01mar2011'd,18)) period=3;
  when (Dates_of_visit lt intnx('month','01mar2011'd,24)) period=4;
  otherwise;
 end;
run;
proc sql;
 create table temp as
  select * from have
   group by id
    having count(distinct period)=4 
      order by id,period,Dates_of_visit ;
quit;
data temp1;
 set temp;
 by  id period;
if first.period ;
period=period+1;
run;
data want;
 merge temp temp1(rename=(Dates_of_visit=_Dates_of_visit));
 by id period;
 if last.period then do;
  if (Dates_of_visit-_Dates_of_visit) ge 60 then output;
 end;
run;
proc sort data=want(keep=id) out=final nodupkey;by id;run;

Xia Keshan

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 303 views
  • 6 likes
  • 4 in conversation