BookmarkSubscribeRSS Feed
tobriain
Calcite | Level 5

I need some advice about how to represent the following data. The code is lengthy, but i've tested it and it works well to create random data for this sample.

I want to represent the difference in mean stake from Day 1 - Day 3 i.e. the difference in the mean of the variables day1TotalStake, day2TotalStake and day3TotalStakeAll. All I've done at the end of the code below is call a 'proc means' on those three variables and I also tried a 'proc tabulate' on one of those variables: day1TotalStake to see the distribution by stake quartile and return quartile for the first day only. I need some way of showing how the mean value in each quartile of that matrix changes over the next two days.

There is quite a lot of data and eyeballing it in table format is not working. A clever data visualization technique of some kind would nail the representation of the data much better than a very verbose table. I was thinking of some kind of heat map or a graph of some kind but , but have no idea how that would show changes over time i.e. what happens to the value in cell (0,0) on day1, day2 and day3.

I'd appreciate any pointers at all on this.

Thanks.

* This greats a series of rows at random;

data random(keep=date timestamp username stake price winlose winnings);

call streaminit(123); /* set random number seed */

format timestamp datetime.;

format date date9.;

NRows = 1000;MinStake = 0.01;MaxStake = 10;Betprice = 1.83;StartDate = MDY(1,1,2020);DateRange=100;OpeningHour=8;ClosingHour=15;MaxCustomers=1000;

  do i = 1 to NRows;

    u = rand("Uniform");    /* U[0,1] */

  date = StartDate+floor(ranuni(0)*DateRange);

    timestamp = DHMS(date,OpeningHour + floor((1+ClosingHour-0)*u),0 + floor((1+59-0)*u),0 + floor((1+59-0)*u));

  username = cat("player", 1 + floor((1+MaxCustomers-1)*u));

  stake = floor( (MinStake+MaxStake)*u ) * 10;

  price = betprice;

  winlose=rand('BERN',.50);

  winnings = winlose * Betprice * stake;  

    output;

  end;

run;

* This sort is necessary for the next data step;

proc sort data=random;

   by username timestamp;

run;

* This data step calculates a number of variables on the fly;

data random;

   set random; 

      by username date timestamp; 

   retain calendarTime eventTime;

   if first.username then calendarTime = 0;

   if first.date then calendarTime + 1;

   if first.username then eventTime = 0;

   if first.timestamp then eventTime + 1;

run;

* This creates an aggregate table to enable functions like 'proc freq' to be run;

proc sql;

  create table randomAggregate as

  select

  username,

  sum(case when calendarTime =1 then stake else . End) as day1TotalStake,

  sum(case when calendarTime =1 then winnings - stake else . End) as day1TotalWinnings,

  sum(case when calendarTime =1 then winnings - stake else . End)/sum(case when calendarTime =1 then stake else . End)as day1AverageReturn,

  sum(case when calendarTime =2 then stake else . End) as day2TotalStake,

  sum(case when calendarTime =2 then winnings - stake else . End) as day2TotalWinnings,

  sum(case when calendarTime =2 then winnings - stake else . End)/sum(case when calendarTime =2 then stake else . End)as day2AverageReturn,

  sum(case when calendarTime =3 then stake else . End) as day3TotalStake,

  sum(case when calendarTime =3 then winnings - stake else . End) as dasy3TotalWinnings,

  sum(case when calendarTime =3 then winnings - stake else . End)/sum(case when calendarTime =3 then stake else . End)as day3AverageReturn

  from random

  group by username;

quit;

* This adds the quartile flags;

Proc rank data=randomAggregate ties=mean out=randomRanked groups=4;

     var day1TotalStake day1AverageReturn day2TotalStake day2AverageReturn;

  ranks day1TotalStakeQuartile day1AverageReturnQuartile day2TotalStakeQuartile day2AverageReturnQuartile;

  LABEL day1TotalStakeQuartile = 'day1TotalStakeQ';

  LABEL day1AverageReturnQuartile = 'day1AverageReturnQ';

  LABEL day2TotalStakeQuartile = 'day2TotalStakeQ';

  LABEL day2AverageReturnQuartile = 'day2AverageReturnQ';

run;

*This prints out the count and stats for each cell in the matrix in list format for the first THREE days;

proc means data = randomRanked nmiss mean MAXDEC=2;

  var day1TotalStake day2TsotalStake day3TotalStake;

  class day1TotalStakeQuartile day1AverageReturnQuartile;

run;

* This shows the sum and mean at the end of Day 1 in MATRIX format;

* it only shows the TOTAL stake: not the mean;

proc tabulate data=randomRanked noseps missing;

  class day1AverageReturnQuartile day1TotalStakeQuartile;

  var day1TotalStake;

  table day1TotalStakeQuartile= "Stake Quartile", day1TotalStake*day1AverageReturnQuartile="AverageReturn Quartile"*sum;

run;

2 REPLIES 2
Jay54
Meteorite | Level 14

Hard to say, but maybe you can plot a 4x4 matrix using a GTL LAYOUT LATTICE to define a 4x4 matrix of SERIES plot with day on x axis, and mean values on y axis.

tobriain
Calcite | Level 5

Great: thanks for that. Any idea how I might be able to compare the three matrices themselves?

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 962 views
  • 3 likes
  • 2 in conversation