BookmarkSubscribeRSS Feed
asinha
Calcite | Level 5

I need to write a code for estimate joint probabilities of events that have their own independent probabilities. As an example, I have attached an Excel spreadsheet which does the calculations. The actual calculations that I need to do in SAS, has numerous more events and thousands more possibilities. However, I think, if I can get some help with this example, I will be able to write a code for my actual calculation.

 

 

In the spreadsheet, the raw data is from row 2 to 101. Row one has the heading. The data consists of 5 events A, B, C, D and X. Each event has 4 possible outcomes, with their own unique probabilities, which are also available. The data is for 5 years. I need to estimate the joint probabilities of different possibilities of A, B, C, and D with those of X. For example, A1 with X1, X2, X3 and X4, and then with A2 with X1, X2, X3 and X4. Similarly, for A3, and A4, and then with B, C and D. As an example, I show the calculations in the spreadsheet from rows 107 to 430. The column titled Joint Possibilities show the possible combinations, and the Column Joint probabilities estimates the probabilities.

3 REPLIES 3
mkeintz
PROC Star

I think this is an excellent use case for INNER JOIN in PROC SQL

 

proc sql;
  create table want as
  select L.year
        ,catx('*',L.outcome,R.outcome) as joint_possibility
        ,L.prob*R.prob as joint_probability
  from have as L inner join have as R
  on L.year=R.year and L.event<R.event
  order by year,joint_possibility;
quit;

I didn't test this on your sample data - since I'm not going to go to the effort to transcribe your data to a sas DATA step.  So I simulated and tested the data below:

 

data have (drop=i _cumprob);
  do year=1 to 2;
    do event='A','B','C','D','X';
      length outcome $2;
      _cumprob=0;
      array t {4} _temporary_;
      do i=1 to 4;
        t{i}=ranuni(0981505);
      end;
      do i=1 to 4;
        outcome=cats(event,i);
        prob=t{i}/sum(of t{*});
        output;
      end;
    end;
  end;
run;

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
asinha
Calcite | Level 5

Thank you. 

I have not used the proc sql before. I will familiarize myself with it, and try out your code. Appreciate your help, and if I need further clarification on this issue, I will reach out again.

 

mkeintz
PROC Star

If your data are sorted by year, and then data set is large, then this code is likely faster.  It depends on reading each year of data twice, the first to build a hash object (think "lookup table"), and the second pass to reread the same year, and compute the joint_possibility and joint_probability:  At the end of processing for that year, the hash object is emptied, and ready for the next year.

 

data want (keep=year joint_possibility joint_probability) ;
  set have (in=firstpass)  have (in=secondpass);
  by year;

  if _n_=1 then do;
    if 0 then set have (rename=(year=_year event=_event outcome=_outcome prob=_prob));
    declare hash evts(ordered:'a') ;
      evts.definekey('_event','_outcome');
      evts.definedata('_event','_outcome','_prob');
      evts.definedone();
    declare hiter e ('evts');
  end;

  if firstpass then evts.add(key:event,key:outcome,data:event,data:outcome,data:prob);

  if secondpass then do while (e.next()=0);
    if _event<=event then continue;
    length joint_possibility $20;
    joint_possibility = catx('*',outcome,_outcome);
    joint_probability = prob*_prob;
    output;
  end;
  if last.year then evts.clear();
run;

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 565 views
  • 0 likes
  • 2 in conversation