BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mkeintz
PROC Star

How to count combinations:

 

 

Assuming you've removed the "do until ..." do group, then after

    "retain newcombo.;"

put

   "retain combo_counter;'

 

And after the

   "newcomdo=..."

statement, put;

   combo_counter=ifn(first.id, 1, combo_counter+ (new_combo^=lag(new_combo)));

 

regards,

Mark

By the way I have a paper "Leads and Lags: Static and Dynamic Queues in the SAS Data step."  The structure of your problem fits the "irregular lags" section of the paper.

 

  

 

  

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
CP2
Pyrite | Level 9 CP2
Pyrite | Level 9

thanks so much. I ended up leaving the 'do until' there so that the first entry with the same date gets the same newcombo as the next color with the same date. I just added another do until loop at the end to output all the rows:

 

do until (last.date) ;
set have ;
by id date ;
output ;
end;

 

 

I am having one problem though if you duplicate some of the datalines and change the id, the first row of the new id has a blank newcombo. Any idea why this might be happening? I think it's carrying over the newcombo which is blank but it shouldn't do that. do you think newcombo should equal color before the first do until loop?

colors.PNG

 

CP2
Pyrite | Level 9 CP2
Pyrite | Level 9

Ok, so it looks like I needed to add  'or first.id' to the statement that looks for a new color . 

 

/* If it's a new color, add it to beginning of colors array, which will be blank */
if findw(trim(newcombo),trim(color),'-')=0 or first.id then colors{1}=color;

 

 

here is the whole code which seems to work with my drug data as well. Thank you so much!!!!

 

data have ;
informat date mmddyy10.;
input id $ date color $ expectedCombo $;
format date mmddyy10.;
datalines;
1321 07/10/2015 blue blue
1321 07/11/2015 blue blue
1321 07/12/2015 blue blue
1321 07/13/2015 blue blue
1321 08/18/2015 red blue-red
1321 08/19/2015 blue blue-red
1321 08/20/2015 blue blue-red
1321 08/21/2015 blue blue-red
1321 08/22/2015 blue blue-red
1321 09/15/2015 red blue-red
1321 09/16/2015 blue blue-red
1321 09/17/2015 blue blue-red
1321 09/18/2015 blue blue-red
1321 01/12/2016 red blue-red
1321 01/12/2016 blue blue-red
1321 01/13/2016 blue blue-red
1321 01/14/2016 blue blue-red
1321 01/15/2016 blue blue-red
1321 01/16/2016 blue blue-red
1321 01/17/2016 blue blue-red
1321 01/18/2016 blue blue-red
1321 01/19/2016 green blue-red-green
1321 01/19/2016 blue blue-red-green
1321 01/20/2016 blue blue-red-green
1321 01/21/2016 blue blue-red-green
1321 01/22/2016 blue blue-red-green
1321 01/23/2016 blue blue-red-green
1321 01/24/2016 blue blue-red-green
1321 01/25/2016 blue blue-red-green
1325 01/16/2016 blue blue-red
1325 01/17/2016 blue blue-red
1325 01/18/2016 blue blue-red
1325 01/19/2016 green blue-red-green
1325 01/19/2016 blue blue-red-green
1325 01/20/2016 blue blue-red-green
1325 01/21/2016 blue blue-red-green
1325 01/22/2016 blue blue-red-green
1325 01/23/2016 blue blue-red-green
1325 01/24/2016 blue blue-red-green
1325 01/25/2016 blue blue-red-green
;


data want;

if _n_=1 then do;
if 0 then set have;
attrib start format=mmddyy10. length=8;

declare hash history(ordered:'a');
history.definekey('color');
history.definedata('color','start');
history.definedone();
end;

array colors{20} $20 _temporary_;
attrib newcombo length=$400;
retain newcombo;

do until (last.date);
set have;
by id date;

/* Update this color item in the history table */
rc=history.find();
start=date;
rc=history.replace();

/* If it's a new color, add it to beginning of colors array, which will be blank */
if findw(trim(newcombo),trim(color),'-')=0 or first.id then colors{1}=color;

/* Use hash table to see if any colors are over 30 days and should be removed*/
/* Start at right-hand end of array because it is sorted (therefore left end */
/* is blank except for the new color) */
current_color=color;
do d=dim(colors) to 1 by -1 while(colors{d}^=' ');
rc=history.find(key:colors{d});
if date-start>30 then colors{d}=' ';
end;
color=current_color;
call sortc(of colors{*});
end;

newcombo=catx('-',of colors{*});

do until (last.date) ;
set have ;
by id date ;
output ;
end;


if last.id then do;
rc=history.clear();
call missing(of colors{*});
end;
run;
proc print ; run;

 

CP2
Pyrite | Level 9 CP2
Pyrite | Level 9

@mkeintz Hi Mark, I have been asked to revise the program so that color red drops if the prior entrance into the dataset is more than 45 days while all other colors drop after 30 days. I tried revising the data but red drops when it is the same date as another color (I am grouping by id and date so that the newcombo is the same for each drug entered on the same day). 

 

if index(color,"red") > 0 then do ;
                                                  if date-start>45 then colors{d}=' ' ;
                                               end;
else if date-start > 30 then colors{d}=' ';

 

Any thoughts on why this wouldn't work ?

 

So, I think I realized the problem. I had color=current_color; commented out which seemed to have caused the problem. the above condition seems to work as well.  I guess that the hash table overwrites color and it needs to assign it back . Not sure how/why but will try to step through the pdv to see how the hash is working.

mkeintz
PROC Star

I suggest you read about the hash find method and other hash mehods to clarify what  using these methods does to the PDV.  That will explain the use of the current_color assignment

 

As to the new problem you've introduced - color-specific expiration periods. It it's only one color that has an atypicla cutoff as per your example then the logic you suggest is just fine, although I would write it as:

 

    cutoffdays=ifn(color='red',45,30);

    if date-start>cutoffdays then colors{d}=' ';

 

Regards,

Mark

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
CP2
Pyrite | Level 9 CP2
Pyrite | Level 9

thank you. I definitely need work on Hash objects and I've been reading up on hash and thought I understood them as lookup tables but haven't used it like this as placeholders for data of the current dataset. 

 

I am closer to what I need. In a couple of cases colors don't seem to be able to re-enter, especially when it is on the same day as one or  more colors. I have large dataset and for the most part the code works even though it's not getting them all right. At this point I think I could have data entered all of them!! I'm going to have to cut my losses soon but I've learned alot and appreciate your help. 

 

Thanks again.

Patrick
Opal | Level 21

@CP2

The following code should deal with both multiple colors (drugs) on the same day as well as color dependent intervals.


proc format;
  invalue color_interval (just upcase)
    'RED' = 45
    other = 30
  ;
quit;

data want(drop=_:);
  set have;
  by id date;
  length Combo $ 400;

  if _n_=1 then
    do;

      if 0 then _color=color;
      format _date date9.; call missing(_date);
      dcl hash h1_colors(multidata:'n', ordered:'y', hashexp:4);
      dcl hiter hh1_colors('h1_colors');
      h1_colors.defineKey('id','color');
      h1_colors.defineData('_color','_date');
      h1_colors.defineDone();

      dcl hash h1_rowsSameDate(multidata:'y', dataset:'have(obs=0)', hashexp:2);
      h1_rowsSameDate.defineKey('id','date');
      h1_rowsSameDate.defineData(all:'y');
      h1_rowsSameDate.defineDone();

    end;

  /* new id: clear hash h1_colors() */
  if first.id then
    h1_colors.clear();

  /* new date: clear hash h1_rowsSameDate() */
  if first.date then
    h1_rowsSameDate.clear();

  /* keep hashes up to date if required */
  h1_colors.replace(key:id, key:color, data:color, data:date);
  if not first.date or not last.date then h1_rowsSameDate.add();

  /* populate combo */
  if last.date then
    do;
      _rc = hh1_colors.first();
      do while (_rc = 0);
        if intck('day',_date,date)<=input(color,color_interval.) then combo=catx('-',combo,_color);
        else h1_colors.delete();
        _rc = hh1_colors.next();
      end;
    end;

  if first.date and last.date then
    /* only 1 color for the date */
    do;
      output;
    end;
  else if last.date then
    do;
      /* multiple colors for the date */
      h1_rowsSameDate.reset_dup();
      do while(h1_rowsSameDate.do_over() eq 0);
        output;
      end;
    end;

run;
CP2
Pyrite | Level 9 CP2
Pyrite | Level 9

Thanks Patrick. It doesn't look like my post went through. I tried your code but I am getting an error message. It doesn't seem to like the next() function. Can you see any other reason it might be getting that message?

 

 


2602 proc format;
2603 invalue color_interval (just upcase)
2604 'RED' = 45
2605 other = 30
2606 ;
NOTE: Informat COLOR_INTERVAL is already on the library WORK.FORMATS.
NOTE: Informat COLOR_INTERVAL has been output.
2607 quit;

NOTE: PROCEDURE FORMAT used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds


2608
2609 data want(drop=_:);
2610 set carc.colors;
2611 by id date;
2612 length Combo $ 400;
2613
2614 if _n_=1 then
2615 do;
2616
2617 if 0 then _color=color;
2618 format _date date9.; call missing(_date);
2619 dcl hash h1_colors(multidata:'n', ordered:'y', hashexp:4);
2620 dcl hiter hh1_colors('h1_colors');
2621 h1_colors.defineKey('id','color');
2622 h1_colors.defineData('_color','_date');
2623 h1_colors.defineDone();
2624
2625 dcl hash h1_rowsSameDate(multidata:'y', dataset:"carc.colors(obs=1)", hashexp:2);
2626 h1_rowsSameDate.defineKey('id','date');
2627 h1_rowsSameDate.defineData(all:'y');
2628 h1_rowsSameDate.defineDone();
2629
2630 end;
2631
2632 /* new id: clear hash h1_colors() */
2633 if first.id then
2634 h1_colors.clear();
2635
2636 /* new date: clear hash h1_rowsSameDate() */
2637 if first.date then
2638 h1_rowsSameDate.clear();
2639
2640 /* keep hashes up to date if required */
2641 h1_colors.replace(key:id, key:color, data:color, data:date);
2642 if not first.date or not last.date then h1_rowsSameDate.add();
2643
2644 /* populate combo */
2645 if last.date then
2646 do;
2647 _rc = hh1_colors.first();
2648 do while (_rc = 0);
2649 if intck('day',_date,date)<=input(color,color_interval.) then
2649! combo=catx('-',combo,_color);
2650 else h1_colors.delete();
2651 _rc = hh1_colors.next();
2652 end;
2653 end;
2654
2655 if first.date and last.date then
2656 /* only 1 color for the date */
2657 do;
2658 output;
2659 end;
2660 else if last.date then
2661 do;
2662 /* multiple colors for the date */
2663 h1_rowsSameDate.reset_dup();
2664 do while(h1_rowsSameDate.do_over() eq 0);
2665 output;
2666 end;
2667 end;
2668
2669 run;

NOTE: There were 1 observations read from the data set CARC.COLORS.
ERROR: Invalid hash object for iterator operation at line 2651 column 15.
ERROR: DATA STEP Component Object failure. Aborted during the EXECUTION phase.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 226 observations read from the data set CARC.COLORS.
WARNING: The data set WORK.WANT may be incomplete. When this step was stopped there were 224
observations and 4 variables.
WARNING: Data set WORK.WANT was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
real time 0.24 seconds
cpu time 0.07 seconds

 

 

 

Patrick
Opal | Level 21

What SAS version are you on?

CP2
Pyrite | Level 9 CP2
Pyrite | Level 9

I am on version 9.4 .

CP2
Pyrite | Level 9 CP2
Pyrite | Level 9

also, I don't want to drop the color/drug altogether if it shows up later on or with another combination. I just want to drop the color/drug from the combination at day 30 (or 45) if it hasn't been added. If it gets added at day 32 (or 47) then it starts either its own combination or merges with another combination if other colors/drugs are entered on the same day. 

 

Thanks for you help. 

Patrick
Opal | Level 21

Hmm... The code works for me. I was thinking that the do_over()  method eventually was the issue for you as it has only been added in SAS9.4 - may be not on the first maintenance realease though.

 

I've added now a code variation which doesn't use do_over(). Give it a go.

 

also, I don't want to drop the color/drug altogether

The hash is only used to create the combination so we don't need colors in there which shouldn't become part of the combination. If there is a new color or color appearing again then it gets always added to the hash. Keeping the entries in the hash to the required minimum redudes looping over the hash when populating variable combo.

 

Hope you can now execute the code. Let me know if the results are not as required.

 

Comment: Wrong code version replaced with corrected version. Same code as in next post.

 


proc format;
  invalue color_interval (just upcase)
    'RED' = 45
    other = 30
  ;
quit;

data want(drop=_:);
  set have;
  by id date;
  length Combo $ 400;

  if _n_=1 then
    do;

      if 0 then _color=color;
      format _date date9.; call missing(_date);
      dcl hash h1_colors(multidata:'n', ordered:'y', hashexp:4);
      dcl hiter hh1_colors('h1_colors');
      h1_colors.defineKey('id','color');
      h1_colors.defineData('_color','_date');
      h1_colors.defineDone();

      dcl hash h1_rowsSameDate(multidata:'y', dataset:'have(obs=0)', hashexp:2);
      dcl hiter hh1_rowsSameDate('h1_rowsSameDate');
      h1_rowsSameDate.defineKey('id','date');
      h1_rowsSameDate.defineData(all:'y');
      h1_rowsSameDate.defineDone();

    end;

  /* new id: clear hash h1_colors() */
  if first.id then
    h1_colors.clear();

  /* new date: clear hash h1_rowsSameDate() */
  if first.date then
    h1_rowsSameDate.clear();

  /* keep hashes up to date if required */
  h1_colors.replace(key:id, key:color, data:color, data:date);
  if not first.date or not last.date then h1_rowsSameDate.add();

  /* populate combo */
  if last.date then
    do;
      _rc = hh1_colors.first();
      do while (_rc = 0);
        if intck('day',_date,date)<=input(color,color_interval.) then combo=catx('-',combo,_color);
        else h1_colors.delete();
        _rc = hh1_colors.next();
      end;
    end;

  if first.date and last.date then
    /* only 1 color for the date */
    do;
      output;
    end;
  else if last.date then
    do;
      /* multiple colors for the date */
      _rc = hh1_rowsSameDate.first();
      do while (_rc = 0);
        output;
        _rc = hh1_rowsSameDate.next();
      end;
    end;

run;

 

CP2
Pyrite | Level 9 CP2
Pyrite | Level 9

Thanks for trying! I can run the code without errors this time but when I use my sample dataset with 48 rows, the 'want' table only ends up with 13 records. So it's not giving me the right combinations for each row.  Not sure it's even doable but thanks for trying. 

Patrick
Opal | Level 21

Ooops! Too much copy/paste. Try again. You should now get all the rows.

proc format;
  invalue color_interval (just upcase)
    'RED' = 45
    other = 30
  ;
quit;

data want(drop=_:);
  set have;
  by id date;
  length Combo $ 400;

  if _n_=1 then
    do;

      if 0 then _color=color;
      format _date date9.; call missing(_date);
      dcl hash h1_colors(multidata:'n', ordered:'y', hashexp:4);
      dcl hiter hh1_colors('h1_colors');
      h1_colors.defineKey('id','color');
      h1_colors.defineData('_color','_date');
      h1_colors.defineDone();

      dcl hash h1_rowsSameDate(multidata:'y', dataset:'have(obs=0)', hashexp:2);
      dcl hiter hh1_rowsSameDate('h1_rowsSameDate');
      h1_rowsSameDate.defineKey('id','date');
      h1_rowsSameDate.defineData(all:'y');
      h1_rowsSameDate.defineDone();

    end;

  /* new id: clear hash h1_colors() */
  if first.id then
    h1_colors.clear();

  /* new date: clear hash h1_rowsSameDate() */
  if first.date then
    h1_rowsSameDate.clear();

  /* keep hashes up to date if required */
  h1_colors.replace(key:id, key:color, data:color, data:date);
  if not first.date or not last.date then h1_rowsSameDate.add();

  /* populate combo */
  if last.date then
    do;
      _rc = hh1_colors.first();
      do while (_rc = 0);
        if intck('day',_date,date)<=input(color,color_interval.) then combo=catx('-',combo,_color);
        else h1_colors.delete();
        _rc = hh1_colors.next();
      end;
    end;

  if first.date and last.date then
    /* only 1 color for the date */
    do;
      output;
    end;
  else if last.date then
    do;
      /* multiple colors for the date */
      _rc = hh1_rowsSameDate.first();
      do while (_rc = 0);
        output;
        _rc = hh1_rowsSameDate.next();
      end;
    end;

run;
CP2
Pyrite | Level 9 CP2
Pyrite | Level 9

Thanks! My system doesn't seem to like the next() iterator and produces an error. I have a question to sas support about version issues for this so I'll wait and see what they say. My 'have' dataset is 450 records but only 224 were read from it before the error. I checked row 224 where it got cut off and I don't see any odd data. So, I really appreciate your help and will look again at it after I hear back from support. 

 

307 /* populate combo */
308 if last.date then
309 do;
310 _rc = hh1_colors.first();
311 do while (_rc = 0);
312 if intck('day',_date,date)<=input(color,color_interval.) then
312! combo=catx('-',combo,_color);
313 else h1_colors.delete();
314 _rc = hh1_colors.next();
315 end;
316 end;
317
318 if first.date and last.date then
319 /* only 1 color for the date */
320 do;
321 output;
322 end;
323 else if last.date then
324 do;
325 /* multiple colors for the date */
326 _rc = hh1_rowsSameDate.first();
327 do while (_rc = 0);
328 output;
329 _rc = hh1_rowsSameDate.next();
330 end;
331 end;
332
333 run;

NOTE: There were 0 observations read from the data set WORK.HAVE.
ERROR: Invalid hash object for iterator operation at line 314 column 15.
ERROR: DATA STEP Component Object failure. Aborted during the EXECUTION phase.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 226 observations read from the data set WORK.HAVE.
WARNING: The data set WORK.WANT may be incomplete. When this step was stopped there were 224
observations and 4 variables.
NOTE: DATA statement used (Total process time):
real time 0.07 seconds
cpu time 0.07 seconds

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 37 replies
  • 1882 views
  • 5 likes
  • 4 in conversation