Re: checking the range

Sandhya · Posted 12-07-2010 02:17 PM

Hi,

I have a dataset which has an ID, ADATE and SERVICE.

ID ADATE SERVICE

1 12/7/2009 I
2 1/12/2009 I
2 1/13/2009 OP
3 6/5/2009 I
4 7/29/2009 LA
5 3/23/2009 OP
...

Now I need to check the Service that are not 'I'. Once I find out which are the records that are not equal to 'I', I need to find out if there are any records for the same id which has the same date or 1 day earlier to that day. If yes, then I need to convert the SERVICE from OP to 'I', else do not change the SERVICE value.

Since I have a fairly huge dataset, I do not have the luxury of performing multiple iterations. So I'm looking at doing this in minimal number of iterations.

Please help.

Thanks in advance,
Sandhya.

ArtC · Posted 12-07-2010 02:25 PM

If you sort
[pre] by id descending adate;[/pre]
you can check a the service if not 1 then the next obs is the one to check for your date range. For this DATA step approach you will need to use either a LAG function or a RETAIN.

SPR · Posted 12-07-2010 02:56 PM

Hello AtrC,

The OP mentioned that besides 1 day earlier date the same date is also possible. In thsi case the algorithm should be more complicated.

Sincerely,
SPR

SPR · Posted 12-07-2010 05:20 PM

Hello Sandhya,

If I understood you correctly then this is solution (not simple one):
[pre]
data i;
input ID ADATE MMDDYY10. SERVICE $;
format adate date7.;
date=adate;
datalines;
1 12/7/2009 I
2 1/12/2009 I
2 1/13/2009 OP
2 1/13/2009 OP
3 6/4/2009 OP
3 6/5/2009 OP
4 7/29/2009 LA
5 3/23/2009 OP
;
run;
proc sort data=i out=s;
by ID adate;
run;
proc SQL noprint;
select COUNT(distinct ID) into :n from i;
%let n=%TRIM(&n);
select distinct ID into :id1-:id&n from i;
quit;
%macro a;
%local i j k k1;
%do i=1 %to &n;
proc sql noprint;
select count(*) into :nd from s where ID=&&ID&i;
%let nd=%TRIM(&nd);
select date into :d1-:d&n from s where ID=&&ID&i;
select service into :s1-:s&n from s where ID=&&ID&i;
quit;
data dr&i;
length service $2;
%do j=1 %to &nd;
%if &&s&j NE I %then %do;
%do k=&j %to 1 %by -1;
%if &k GT 1 %then %do;
if &&d&j = &&d&k OR %EVAL(&&d&j - &&d&k) = 1
then %let s&j=I;
%end;
%end;
%end;
id=&&ID&i;
adate=&&d&j;
service="&&s&j";
format adate date7.;
output;
%end;
run;
%if &i = 1 %then %do; data r; retain id adate service; set dr&i; run; %end;
%else %do; data r; set r dr&i; run; %end;
%end;
%mend a;
%a;
[/pre]
Sincerely,
SPR

polingjw · Posted 12-08-2010 09:28 AM

If the data set is large, I doubt that a macro solution will be efficient. Does this solution, which still assumes the data set is sorted by ID and ADATE, work?

[pre]
data have;
input ID ADATE MMDDYY10. SERVICE $;
format adate date7.;
date=adate;
datalines;
1 12/7/2009 I
2 1/12/2009 I
2 1/13/2009 OP
2 1/13/2009 OP
3 6/4/2009 OP
3 6/5/2009 OP
4 7/29/2009 LA
5 3/23/2009 OP
;
run;
proc sort data=have;
by ID adate;
run;

data want(keep=id adate service);
set have;
by id;
lastdate = lag(adate);
if not last then
set have(firstobs=2 keep=adate rename=(adate=nextdate)) end=last;
if service ne 'I' and ((not last.id and nextdate=adate) or (not first.id and adate-lastdate in (0 1))) then service = 'I';
run;

[pre]

polingjw · Posted 12-08-2010 10:20 AM

If you have SAS 9.2, here is a multidata hash object solution that does not require sorting. However, the code is a little messy.

[pre]
data have;
input ID ADATE MMDDYY10. SERVICE $;
format adate date7.;
datalines;
1 12/7/2009 I
2 1/12/2009 I
2 1/13/2009 OP
2 1/13/2009 OP
3 6/4/2009 OP
3 6/5/2009 OP
4 7/29/2009 LA
5 3/23/2009 OP
;
run;

data want(keep=id adate service);
if _n_=1 then do;
declare hash h(dataset:'have(keep=id adate rename=(adate=date))', multidata: 'y');
h.defineKey('id');
h.defineData('id', 'date');
h.definedone();
end;
set have;
foundself = 0;
if service ne 'I' then do;
h.find();
if adate = date and foundself = 0 then foundself = 1;
else if (adate = date and foundself = 1) or date = adate-1 then service = 'I';
h.has_next(result: r);
do while(service ne 'I' and r ne 0);
h.find_next();
if adate = date and foundself = 0 then foundself = 1;
else if (adate = date and foundself = 1) or date = adate-1 then service = 'I';
h.has_next(result: r);
end;
end;
run;
[/pre] Message was edited by: polingjw

Ksharp · Posted 12-09-2010 12:28 AM

Hi.
For the sake of your complex logic , I am afraid you need more iterations.

[pre]
data i;
input ID $ ADATE : MMDDYY10. SERVICE $;
format adate mmddyy10.;
datalines;
1 12/7/2009 I
2 1/12/2009 I
2 1/13/2009 OP
3 6/5/2009 I
4 7/29/2009 LA
5 3/23/2009 OP
;

proc sort data=i nodupkey;
by id adate service;
run;

data index;
set i;
where service not eq 'I';
run;

proc sql feedback; *createing table contained observations which is needed to change into I;
create table flag as
select a.*
from index as a left join i as b on a.id eq b.id
where ( b.adate eq a.adate ) or ( b.adate eq (a.adate - 1))
group by a.id,a.service
having count(*) ge 2;
quit;

proc sort data=flag nodupkey;
by id adate service;
run;

data flag;
set flag;
service='I';
run;

proc sort data=i;
by id adate;
run;

data result;
merge i flag;
by id adate;
run;

[/pre]

Ksharp Message was edited by: Ksharp

Sandhya · Posted 12-10-2010 10:48 AM

Thanks guys, it was very helpful.

Sandhya.

LewisC_sas · Posted 12-10-2010 06:04 PM

Hello Sandhya,

From reading, and rereading, the messages in this thread, I think that I understand your problem well enough now for me to contribute the following code that may be of some use. Of course, there may have been some subtlety that I missed.

During testing, I used the original dataset that you provided and also the modified one that SPR provided. In both cases, the mainly PROC SQL code below generated the results that I expected. However, of course, that is no guarantee that there are no bugs lurking somewhere in it.

The code below is just something additional that you may want to look at, or consider. It complements the nice responses that others in this thread have contributed.

==========================================================
options ls=max;

data work.ds;
input id $ adate:mmddyy10. service $;
format adate mmddyy10.;
datalines;
1 12/7/2009 I
2 1/12/2009 I
2 1/13/2009 OP
2 1/13/2009 OP
3 6/4/2009 OP
3 6/5/2009 OP
4 7/29/2009 LA
5 3/23/2009 OP
;

/*--
data work.ds;
input id $ adate:mmddyy10. service $;
format adate mmddyy10.;
datalines;
1 12/7/2009 I
2 1/12/2009 I
2 1/13/2009 OP
3 6/5/2009 I
4 7/29/2009 LA
5 3/23/2009 OP
;
--*/

proc sql;

create table work.sI as
select *
from work.ds
where service = 'I';

create table work.sNI as
select *
from work.ds
where service not = 'I';

/*---------------------------------------------------------*/
/*-- Identify those records in work.sNI that have the --*/
/*-- same id as one or more records in work.sI and whose --*/
/*-- adates are either the same as one of these records --*/
/*-- or corresponds to a record in work.sI that has a --*/
/*-- date one day earlier than the corresponding work.sNI--*/
/*-- record. The records in the result set for this --*/
/*-- query are those that need to have their service --*/
/*-- changed from 'OP' to 'I'. --*/
/*---------------------------------------------------------*/
create table work.needToChange as
select *
from work.sNI
where id in (select id
from work.sI
where ((sNI.adate - sI.adate) in (0,1)))
and service = 'OP';

/*-------------------------------------------------------*/
/*-- Identify the work.sNI records that do NOT need to --*/
/*-- have their service values change from 'OP' to 'I'.--*/
/*-------------------------------------------------------*/
create table work.dontNeedToChange as
select * from work.sNI
except all
select * from work.needToChange;

/*------------------------------------------------------*/
/*-- Create the new dataset, sorted if need be. --*/
/*------------------------------------------------------*/
create table work.result as
select id, adate, service from work.sI
union all
select id, adate, 'I' from work.needToChange
union all
select id, adate, service from work.dontNeedToChange
order by id, adate, service; /* if ordering is desired */

quit;

proc print; run;
==========================================================

Sincerely,
qseries@sas

polingjw · Posted 12-14-2010 08:41 AM

If you want to use SQL, here is a really quick solution to this problem. I don't know if it would be efficient for a large data set due to the correlated sub-queries that are used.

[pre]
data work.ds;
input id $ adate:mmddyy10. service $;
format adate mmddyy10.;
datalines;
1 12/7/2009 I
2 1/12/2009 I
2 1/13/2009 OP
2 1/13/2009 OP
3 6/4/2009 OP
3 6/5/2009 OP
4 7/29/2009 LA
5 3/23/2009 OP
;

proc sql;
create table want as
select id, adate, case
when service = 'I' then 'I'
when (select count(*) from ds as ds1 where ds1.id=ds3.id and ds1.adate = ds3.adate) ge 2 then 'I'
when (select count(*) from ds as ds2 where ds2.id=ds3.id and ds2.adate = ds3.adate - 1) ge 1 then 'I'
else service
end as service
from ds as ds3;
quit;
[/pre]

Ksharp · Posted 12-26-2010 10:12 PM

Sorry.I found some problem in my code;
The following is corrected code.

[pre]
data i;
input ID $ ADATE : MMDDYY10. SERVICE $;
format adate mmddyy10.;
datalines;
1 12/7/2009 I
2 1/12/2009 I
2 1/13/2009 OP
3 6/5/2009 I
4 7/29/2009 LA
5 3/23/2009 OP
;

proc sort data=i nodupkey;
by id adate service;
run;

data index;
set i;
where service not eq 'I';
run;

proc sql feedback; *createing table contained observations which is needed to change into I;
create table flag as
select a.*
from index as a left join i as b on a.id eq b.id
where ( b.adate eq a.adate ) or ( b.adate eq (a.adate - 1))
group by a.id,a.service
having count(*) ge 2;
quit;

proc sort data=flag nodupkey;
by id adate service;
run;

data flag;
set flag;
_service='I';
drop service;
run;

proc sort data=i;
by id adate;
run;

data result;
merge i flag;
by id adate;
run;
data result;
set result;
if _service eq 'I' then service=_service;
drop _service;
run;

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away