BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
sree2
Calcite | Level 5

Hi All,

I am facing a bit of dead end with the logic. Thought to post in on the forum to seek help.

I have the following datasets.

 

data have1;
   input case action_in_date action_out_date priority;
   cards;
100 28-MAY-2015 29SEP2015 02
100 04-SEP-2015 27NOV2015 03
100 27-NOV-2015 17SEP2016 04
;
run;

data have2;
input case post_date amount;
   cards;
100 18-SEP-2015 250
100 19-SEP-2015 100
100 30-NOV-2015 300
;
run;

 

Now, the post date in first 2 rows (18-SEP-2015 and 19-SEP-2015)  in have2 dataset have a post date which falls between two date ranges in have1 dataset. i e. between  28-MAY-2015 and 29SEP2015 and 04-SEP-2015 and  27NOV2015. But i want the money to be allocated to highest priority. ie between 04-SEP-2015 and  27NOV2015 which has a priority 03. The output i am looking for is below

/*output;*/
100 04-SEP-2015 27NOV2015 03 350
100 27-NOV-2015 17SEP2016 04 300

 

Any help is highly appreciated.

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

I presume you have different date ranges for difference CASE values.  If so, I think there are 3 steps.

 

  1. Read all the HAVE1 records for a CASE and build an array from the mindate to the maxdate indicating the highest priority, and corresponding date-group (action_in_date:action_out_date).  That is the arrays are a map from each date to the desired date group.
  2. Read all the HAVE2 records for the same case.  Use the map above to accumulate identify which date groups do with each post_date.
  3. The aggregate the amount's for each date group:

 


%let lobound=%sysfunc(inputn(01may2015,date9.));
%let upbound=%sysfunc(inputn(30sep2016,date9.));

/** step 1 **/
data date_groups (keep=case action_in_date action_out_date post_date) / veiw=date_groups;
  set have1;
  by case;

  array _aidate{&lobound:&upbound} _temporary_;
  array _aodate{&lobound:&upbound} _temporary_;
  array _priorities{&lobound:&upbound} _temporary_;

  if first.case then call missing(of _aidate{*},of _aodate{*}, of _priorities{*}, min_date, max_date);

  retain min_date max_date;
  min_date=min(min_date,action_in_date);
  max_date=max(max_date,action_out_date);

  do post_date=action_in_date to action_out_date;
    if priority>_priorities{post_date} then do;
	  _priorities{post_date}=priority;
	  _aidate{post_date}=action_in_date;
	  _aodate{post_date}=action_out_date;
	end;
  end;

  if last.case then do post_date=min_date to max_date;
    action_in_date=_aidate{post_date};
    action_out_date=_aidate{post_date};
	output;
	format post_date date9.;
  end;
run;

/** step 2 **/
data need / view=need;
  merge date_groups have2 (in=inh);
  by case post_date;
  if inh;
run;

/** step 3 **/

proc summary data=need noprint nway ;
  class case action_in_date action_out_date ;
  var amount;
  output out=want sum(amount)=total_amount;
run;

 

Note I've set LOBOUND and UPBOUND to the earliest and latest expected action dates over the entire set of CASEs.

 

All this could be done in a single DATA step, but I don't see how to do so without introducing hash objects.  This is relatively simpler to understand.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

3 REPLIES 3
mkeintz
PROC Star

I presume you have different date ranges for difference CASE values.  If so, I think there are 3 steps.

 

  1. Read all the HAVE1 records for a CASE and build an array from the mindate to the maxdate indicating the highest priority, and corresponding date-group (action_in_date:action_out_date).  That is the arrays are a map from each date to the desired date group.
  2. Read all the HAVE2 records for the same case.  Use the map above to accumulate identify which date groups do with each post_date.
  3. The aggregate the amount's for each date group:

 


%let lobound=%sysfunc(inputn(01may2015,date9.));
%let upbound=%sysfunc(inputn(30sep2016,date9.));

/** step 1 **/
data date_groups (keep=case action_in_date action_out_date post_date) / veiw=date_groups;
  set have1;
  by case;

  array _aidate{&lobound:&upbound} _temporary_;
  array _aodate{&lobound:&upbound} _temporary_;
  array _priorities{&lobound:&upbound} _temporary_;

  if first.case then call missing(of _aidate{*},of _aodate{*}, of _priorities{*}, min_date, max_date);

  retain min_date max_date;
  min_date=min(min_date,action_in_date);
  max_date=max(max_date,action_out_date);

  do post_date=action_in_date to action_out_date;
    if priority>_priorities{post_date} then do;
	  _priorities{post_date}=priority;
	  _aidate{post_date}=action_in_date;
	  _aodate{post_date}=action_out_date;
	end;
  end;

  if last.case then do post_date=min_date to max_date;
    action_in_date=_aidate{post_date};
    action_out_date=_aidate{post_date};
	output;
	format post_date date9.;
  end;
run;

/** step 2 **/
data need / view=need;
  merge date_groups have2 (in=inh);
  by case post_date;
  if inh;
run;

/** step 3 **/

proc summary data=need noprint nway ;
  class case action_in_date action_out_date ;
  var amount;
  output out=want sum(amount)=total_amount;
run;

 

Note I've set LOBOUND and UPBOUND to the earliest and latest expected action dates over the entire set of CASEs.

 

All this could be done in a single DATA step, but I don't see how to do so without introducing hash objects.  This is relatively simpler to understand.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
ChrisNZ
Tourmaline | Level 20

Like this?

data HAVE1;
   input CASE ACTION_IN_DATE date9. ACTION_OUT_DATE : date9. PRIORITY;
   format ACTION_IN_DATE ACTION_OUT_DATE  date9.;
   cards;
100 28MAY2015 29SEP2015 02
100 04SEP2015 27NOV2015 03
100 27NOV2015 17SEP2016 04
run;

data HAVE2;
  input CASE  POST_DATE date9. AMOUNT;
  format POST_DATE   date9.;
  ID+1;
  cards;
100 18SEP2015 250
100 19SEP2015 100
100 30NOV2015 300
run;
 

proc sql;
  create table GET_CARTESIAN as
  select  have1.CASE 
        , ACTION_IN_DATE 
        , ACTION_OUT_DATE 
        , PRIORITY
        , ID
        , AMOUNT
  from HAVE1, HAVE2
  where have1.CASE=have2.CASE
    and ACTION_IN_DATE <= POST_DATE <= ACTION_OUT_DATE
  order by 1, ID, PRIORITY;
quit;

data GET_PRIORITY; 
  set GET_CARTESIAN;
  by CASE ID ;
  if last.ID;
run; 

proc sql;
  select  a.CASE 
        , ACTION_IN_DATE 
        , ACTION_OUT_DATE 
        , PRIORITY
        , sum(AMOUNT)
  from GET_PRIORITY a
  group by 1,2,3,4;
quit;
CASE ACTION_IN_DATE ACTION_OUT_DATE PRIORITY  
100 04SEP2015 27NOV2015 3 350
100 27NOV2015 17SEP2016 4 300

 

sree2
Calcite | Level 5

Thankyou both. Much appreciated

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 794 views
  • 1 like
  • 3 in conversation