BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
tobriain
Calcite | Level 5

Hi,

I have the following data:

data have;

   input username $ amount betdate : datetime.;

   dateOnly = datepart(betdate) ;

   format betdate DATETIME.;

   format dateOnly ddmmyy8.;

   datalines;

player1 90 12NOV2008:12:04:01

player1 -100 04NOV2008:09:03:44

player2 120 07NOV2008:14:03:33

player1 -50 05NOV2008:09:00:00

player1 -30 05NOV2008:09:05:00

player1 20 05NOV2008:09:00:05

player2 10 09NOV2008:10:05:10

player2 -35 15NOV2008:15:05:33

run;

PROC PRINT data=have; RUN;

proc sort data=have;

   by username betdate;

run;

data want;

   set have;

   by username betdate dateOnly;  

   retain calendarTime eventTime dailyProfit;

   if first.username then calendarTime = 0;

   if first.dateOnly then calendarTime + 1;

   if first.username then eventTime = 0;

   if first.betdate then eventTime + 1;  

   if first.username then dailyProfit = 0;

    dailyProfit = dailyProfit + amount;

PROC PRINT data=want; RUN;

The value in dailyProfit is incorrect, as it is resetting every new betdate, not every new dateOnly and new username. How can I get the syntax right on this?

Best regards.

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Try removing BETDATE from the BY statement and use DATEONLY instead of BETDATE. Dateonly is "first" for each leverl of BETDATE.

View solution in original post

7 REPLIES 7
ballardw
Super User

Try removing BETDATE from the BY statement and use DATEONLY instead of BETDATE. Dateonly is "first" for each leverl of BETDATE.

Astounding
PROC Star

Your program is missing one statement:

retain dailyProfit;

You can add it anywhere within the DATA step.  The other variables (calendarTime and eventTime) don't need it because there are already being retained.  The syntax "variablename + 1" will automatically retain that variable.

tobriain
Calcite | Level 5

Great stuff. That is now giving me the cumulative dailyProfit in each row as I wanted, and resetting to zero each new date and new user.

But rather than a cumulative value, how can I sum from first.betdate to last.betdate over each username? This would give me the total dailyProfit for each username for each date (repearing across multiple rows just like a SUMIF() in Excel).

Thanks again.

Astounding
PROC Star

It's up to you when you reset DailyProfit to 0.  Right now you have coded:

if first.username than dailyProfit=0;

You can switch that to whatever conditions you would like, such as:

if first.betdate then dailyProfit=0;

Of course, to do that, BETDATE must remain part of your BY statement.

tobriain
Calcite | Level 5

I'm not sure if that resolves it. I really need to add a flag to each user flagging whether the customers total dailyProfit (for each parituclar day) was positive or not. Because the main query I want to run is this:

proc sql;

  select

  outer.calendarTime as calendarTime,

  [dailyProfitIndicator,]

  mean(outer.stake) as meanStake,

  sum(outer.stake) as totalStake,

  sum(outer.winnings - outer.stake) as profit,

  sum(outer.winnings - outer.stake) /sum(outer.stake) as margin

from work.want as outer

where calendarTime =1

[and dailyProfitIndicator = 1]

group by 1

having sum(outer.winnings - outer.stake) > 0;

quit;

What it's supposed to do is get the averageStake for customers on their first day if the first day was a 'winning' day i.e. if the total daily profit for the user on that day was greater than zero. And I want to do this over ALL customers, so I don't think the 'having sum(outer.winnings - outer.stake) > 0;' is aggregating over each username and each day, just over each row in this query (which is only 1).

I've changed the data step above to add another 'totalProfit' rather than the cumulative 'dailyProfit' variable above. But it still shows a cumulative rather than repeating the total in each row. Is this expected?

data want;

   set work.temp; 

      by username dateOnly betdate; 

   retain calendarTime eventTime dailyProfit totalProfit;

   if first.username then calendarTime = 0;

   if first.dateOnly then calendarTime + 1;

   if first.username then eventTime = 0;

   if first.betdate then eventTime + 1;

   if first.username then dailyProfit = 0;

   if first.dateOnly then dailyProfit = 0;

   if first.betdate then dailyProfit + sum(winnings - stake);

   if first.dateOnly then totalProfit = 0;

   if first.betdate then totalProfit + sum(winnings - stake);

run;

ballardw
Super User

If I wanted a summary of total by date for each user I would drop out of a datastep and go to proc means or summary;

proc means data=have sum;

     class username betdate;

     var amount;

run;

If a data set is needed then an output statement can be added.


tobriain
Calcite | Level 5

The question I need to answer is 'What is the average spent by profitable people on their first day'? I can add the 'first day' flag (as above), but can't control for getting the average spent by those that are profitable on the first day rather than the entire sample.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 2163 views
  • 6 likes
  • 3 in conversation