BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
andyec
Fluorite | Level 6

I'm working with a dataset of lab values that occurred within 7 days of each other. Essentially, I'm trying to figure out what values are "duplicates" that couldn't be removed with a proc sort because the lab value and date are close but not the same (unfortunately a result of some reporting issues of date and value). I'm having issues because there can be multiple occurrences by id. For example, id 1 in the simulated dataset below has 3 labs within 7 days of 5/30/19 and also 2 labs within 7 days of 7/7/19.

 

obsiddatevaluepsource
115/30/19201
215/31/19200
316/1/19190
417/7/193501
517/10/192001
621/3/191000
721/4/19991
823/6/19191
923/8/19200
10312/8/1915001
11312/10/1915001
12312/11/1915000
1331/15/20191
1431/16/20200

 

As a result, I would like to create a count column that looks like the one below.

 

obsiddatevaluepsourcecount
115/30/192011
215/31/192002
316/1/191903
417/7/1935011
517/10/1920012
621/3/1910001
721/4/199912
823/6/191911
923/8/192002
10312/8/19150011
11312/10/19150012
12312/11/19150003
1331/15/201911
1431/16/202002

 

Lastly, I would like to apply some rules like if the labs occurred within 7 days of each other for a given id and are close in value (within 11), then extract only the first one that comes from the preferred source (psource=1). For example, keep row 10 and get rid of rows 11 and 12. Also, if the labs occurred within 7 days of each other and aren't close in value then create a dataset of those so that those can undergo further review. For example, output rows 4 and 5.

 

Thank you in advance for any help on this.

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20

Hi @andyec  I was using remove method as an additional check to remove the hash contents for failed records of Notwithin11 logic. You wouldn't need that. However that error shouldn't have occured unless your sample data is different to the real that you are testing against.

 

I have modified the code slightly below



data have;
input obs	id	date :mmddyy10.	value	psource;
format date mmddyy10.;
cards;
1	1	5/30/19	20	1
2	1	5/31/19	20	0
3	1	6/1/19	19	0
4	1	7/7/19	350	1
5	1	7/10/19	200	1
6	2	1/3/19	100	0
7	2	1/4/19	99	1
8	2	3/6/19	19	1
9	2	3/8/19	20	0
10	3	12/8/19	1500	1
11	3	12/10/19	1500	1
12	3	12/11/19	1500	0
13	3	1/15/20	19	1
14	3	1/16/20	20	0
;


data temp;
 do until(last.id);
  set have;
  by id;
  if first.id then do;
   d=date;
   count=1;
   _cg=1;
  end;
  else if intck('days',d,date)>7 then do;
   d=date;
   count=1;
   _cg=sum(_cg,1);
  end;
  else count=sum(count,1);
  output;
 end;
 drop d;
run;

data want;
 if _n_=1 then do;
   dcl hash H () ;
   h.definekey  ("id","_cg") ;
   h.definedata("_d");
   h.definedone();
 end;
 do until(last._cg);
  set temp;
  by id _cg;
  if psource then do;
    _d=date;
    h.ref();
  end;
  if first._cg then _c=1;
  else if abs(_v-value)<=11 then _c=sum(_c,1);
  _v=value;
 end;  
 _f=_c=count;
 do until(last._cg);
  set temp;
  by id _cg;
  if _f then do;
    if  first._cg  then rc=h.find();
    if rc=0 then if _d=date then output;
  end;
  else do;
    Notwithin11=1;
    output;
  end;
 end;
 h.clear();
 drop _: rc;
run;

View solution in original post

12 REPLIES 12
novinosrin
Tourmaline | Level 20

Hi @andyec  for the count column. I'm afraid you need to explain the "close" thing logic more clearly. Thank you!


data have;
input obs	id	date :mmddyy10.	value	psource;
format date mmddyy10.;
cards;
1	1	5/30/19	20	1
2	1	5/31/19	20	0
3	1	6/1/19	19	0
4	1	7/7/19	350	1
5	1	7/10/19	200	1
6	2	1/3/19	100	0
7	2	1/4/19	99	1
8	2	3/6/19	19	1
9	2	3/8/19	20	0
10	3	12/8/19	1500	1
11	3	12/10/19	1500	1
12	3	12/11/19	1500	0
13	3	1/15/20	19	1
14	3	1/16/20	20	0
;


data want;
 do until(last.id);
  set have;
  by id;
  if first.id then do;
   d=date;
   count=1;
  end;
  else if intck('days',d,date)>7 then do;
   d=date;
   count=1;
  end;
  else count=sum(count,1);
  output;
 end;
 drop d;
run;

proc print noobs;run;

obs id date value psource count
1 1 05/30/2019 20 1 1
2 1 05/31/2019 20 0 2
3 1 06/01/2019 19 0 3
4 1 07/07/2019 350 1 1
5 1 07/10/2019 200 1 2
6 2 01/03/2019 100 0 1
7 2 01/04/2019 99 1 2
8 2 03/06/2019 19 1 1
9 2 03/08/2019 20 0 2
10 3 12/08/2019 1500 1 1
11 3 12/10/2019 1500 1 2
12 3 12/11/2019 1500 0 3
13 3 01/15/2020 19 1 1
14 3 01/16/2020 20 0 2

 

andyec
Fluorite | Level 6

Hi @novinosrin thank you for the count column code.

 

For the other part, I want values that occurred within 7 days and are within 11 points of each other to ultimately become one value (with a preference for the one where psource=1 and if multiple have psource=1, then pick the first one). For example, I want the table below to become the one below that.

 

obsiddatevaluepsource
621/3/191000
721/4/19991
823/6/19191
923/8/19200

 

obsiddatevaluepsource
721/4/19991
823/6/19191

 

If the values occurred within 7 days but aren't within 11 points of each other, I want them to be output like the table below.

 

obsiddatevaluepsource
417/7/193501
517/10/192001

 

Is that explained better?

novinosrin
Tourmaline | Level 20

HI @andyec  Hmm I think I am getting it. One more clarification plz. What do you mean as "Points".

 

Is it the values column? To wit, taking the difference between the values -->absolute value(19-20)<=11 is something what you mean as points?

andyec
Fluorite | Level 6

@novinosrin yes I mean the values column and abs(19-20) <= 11 between values that occurred within 7 days of each other.

novinosrin
Tourmaline | Level 20

Hi @andyec  Sorry I got busy for my boss's requirement at my work place. Okay here is what I understand.

Logic:

1. Within an ID, <=7 interval days series of counts for one set giving rise to the variable COUNT. There could be N number of sets  

2. Within each set(count1-countN) in a given ID, if the absolute difference between the values of each value that is nth value-previous nth-1 aka previous) value <=11  then a record will qualify

3. However the number of record that qualifies should be equal to the COUNT_ N i.e. if for example,count=7 and the number of records with abs(diff)<=11 should also be 7. 

4. I added a NotWithin11 flag to make it convenient your review

If the above makes sense, the following should be close, albeit let me know where the discrepancy is, we shall fix it


data have;
input obs	id	date :mmddyy10.	value	psource;
format date mmddyy10.;
cards;
1	1	5/30/19	20	1
2	1	5/31/19	20	0
3	1	6/1/19	19	0
4	1	7/7/19	350	1
5	1	7/10/19	200	1
6	2	1/3/19	100	0
7	2	1/4/19	99	1
8	2	3/6/19	19	1
9	2	3/8/19	20	0
10	3	12/8/19	1500	1
11	3	12/10/19	1500	1
12	3	12/11/19	1500	0
13	3	1/15/20	19	1
14	3	1/16/20	20	0
;


data temp;
 do until(last.id);
  set have;
  by id;
  if first.id then do;
   d=date;
   count=1;
   _cg=1;
  end;
  else if intck('days',d,date)>7 then do;
   d=date;
   count=1;
   _cg=sum(_cg,1);
  end;
  else count=sum(count,1);
  output;
 end;
 drop d;
run;

data want;
 if _n_=1 then do;
   dcl hash H () ;
   h.definekey  ("id","_cg") ;
   h.definedata("_d");
   h.definedone();
 end;
 do until(last._cg);
  set temp;
  by id _cg;
  if h.check() ne 0 and psource then do;
    _d=date;
    h.add();
  end;
  if first._cg then do; _v=value;_c=1;end;
  else if abs(_v-value)<=11 then _c=sum(_c,1);
 end;
 _f=_c=count;
 if not _f then h.remove();
 do until(last._cg);
  set temp;
  by id _cg;
  if _f then do;
    if  first._cg  then rc=h.find();
    if rc=0 then if _d=date then output;
  end;
  else do;
    Notwithin11=1;
    output;
  end;
 end;
 h.clear();
 drop _: rc;
run;
obs id date value psource count Notwithin11
1 1 05/30/2019 20 1 1 .
4 1 07/07/2019 350 1 1 1
5 1 07/10/2019 200 1 2 1
7 2 01/04/2019 99 1 2 .
8 2 03/06/2019 19 1 1 .
10 3 12/08/2019 1500 1 1 .
13 3 01/15/2020 19 1 1 .

 

 

andyec
Fluorite | Level 6

@novinosrin thank you very much for this. I am getting the following error: An error has occurred during instance method OM_Remove(179) of "DATASTEP.HASH." Any idea of what this may be? I will have to try and troubleshoot in a little bit.

novinosrin
Tourmaline | Level 20

Hi @andyec  I was using remove method as an additional check to remove the hash contents for failed records of Notwithin11 logic. You wouldn't need that. However that error shouldn't have occured unless your sample data is different to the real that you are testing against.

 

I have modified the code slightly below



data have;
input obs	id	date :mmddyy10.	value	psource;
format date mmddyy10.;
cards;
1	1	5/30/19	20	1
2	1	5/31/19	20	0
3	1	6/1/19	19	0
4	1	7/7/19	350	1
5	1	7/10/19	200	1
6	2	1/3/19	100	0
7	2	1/4/19	99	1
8	2	3/6/19	19	1
9	2	3/8/19	20	0
10	3	12/8/19	1500	1
11	3	12/10/19	1500	1
12	3	12/11/19	1500	0
13	3	1/15/20	19	1
14	3	1/16/20	20	0
;


data temp;
 do until(last.id);
  set have;
  by id;
  if first.id then do;
   d=date;
   count=1;
   _cg=1;
  end;
  else if intck('days',d,date)>7 then do;
   d=date;
   count=1;
   _cg=sum(_cg,1);
  end;
  else count=sum(count,1);
  output;
 end;
 drop d;
run;

data want;
 if _n_=1 then do;
   dcl hash H () ;
   h.definekey  ("id","_cg") ;
   h.definedata("_d");
   h.definedone();
 end;
 do until(last._cg);
  set temp;
  by id _cg;
  if psource then do;
    _d=date;
    h.ref();
  end;
  if first._cg then _c=1;
  else if abs(_v-value)<=11 then _c=sum(_c,1);
  _v=value;
 end;  
 _f=_c=count;
 do until(last._cg);
  set temp;
  by id _cg;
  if _f then do;
    if  first._cg  then rc=h.find();
    if rc=0 then if _d=date then output;
  end;
  else do;
    Notwithin11=1;
    output;
  end;
 end;
 h.clear();
 drop _: rc;
run;

andyec
Fluorite | Level 6

@novinosrin that appeared to work very well - thank you!

 
novinosrin
Tourmaline | Level 20

Hi @andyec  Do feel free to come back to us in this thread or a new one should you need help. The hardest part in a tech forum in my humble opinion is the communication. The apparent best approach to get quick and precise answers is by offering the following:

1. A "representative sample" data of what you HAVE

2. An expected output sample(WANT) corresponding to the input sample(HAVE)

3. Briefly explaning the convert/transformation or reporting logic/purpose

and finally additional notes/comments if any.

 

Once this is communicated properly, the likelihood is that you will get the much needed solution at the soonest. Thank you!

andyec
Fluorite | Level 6

Hi @novinosrin sorry one more thing. How would this change if multiple have psource=1, then pick the last one (instead of the first one)?

novinosrin
Tourmaline | Level 20

Try changing the the current

  if psource then do;
    _d=date;
    h.ref();
  end;

to

  if psource then do;
    _d=date;
    h.replace();
  end;

 and let me know 🙂  😉

mkeintz
PROC Star

I presume by "within 7 days of each other" you are referring to the number of days between consecutive records  (so if you have 7 records spaced 2 days apart, you could have a single range of 14 days).

 

data have;
  input bs	id	date  :mmddyy10.	value	psource;
  format date date9.;
datalines;
1	1	5/30/2019	20	1
2	1	5/31/2019	20	0
3	1	6/1/2019	19	0
4	1	7/7/2019	350	1
5	1	7/10/2019	200	1
6	2	1/3/2019	100	0
7	2	1/4/2019	99	1
8	2	3/6/2019	19	1
9	2	3/8/2019	20	0
10	3	12/8/2019	1500	1
11	3	12/10/2019	1500	1
12	3	12/11/2019	1500	0
13	3	1/15/2020	19	1
14	3	1/16/2020	20	0
run;

data want;
  set have;
  by id;
  if first.id=1 or dif(date)>7 then count=1;
  else count+1;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 2158 views
  • 6 likes
  • 3 in conversation