So I want to be able to mark 3-4+ observations as the first based on them being tied on a date.
Data looks like this:
| Player_ID | Date | Team | Number | 
|---|---|---|---|
| 1 | 4/3/12 | A | |
| 2 | 4/3/12 | A | |
| 3 | 4/3/12 | A | 
So I am trying to mark all of them as 0 in the number column.
If I do first.team, only the first would be 0, same with Date. Is there an easy solution to this? I am probably just missing it.
data need;
retain triger 0;
set original; by team date player_id;
if first.team = 1 then triger = 1;
if last.team = 1 then triger = 0;
run;
something like that?
if you could give bigger dataset example and the outcome you want to achieve it would be much more easy to understand and explane. im working on something similar but i dont get your problem..
Hi,
try the following code...
proc sort data = test;
by team;
run;
data test;
set test;
by team;
if first.team then do;
retain number 0;
end;
run;
Thanks,
How would that take into account the dates? For each team, there are about 100 observations, and 2-3 that I want to mark as the first, if they all have the same first date.
Try sort by date and team then first.team. Then the first occurence of each team within a given date gets selected.
Problem is I want to select all the observations for each time that are in the first date. Does that make sense? It seems first. can only select one observation for a particular group.
Yes it does, but assuming you're grouping by something else you should be able to do it.
What does the following give you?
proc sort data=have;
by team date player;
data want;
set have;
by team date;
retain number;
if first.team then number=0;
if first.date then number+1;
run;
I see where you are going with it, and I am going to try a few things, but I probably should explain the back story.
I got the number variable as a way to see how many days it has been since a team has played, using the lag function. However, this of course didnt work fully as teams with multiple players on the same day would have a zero instead of a one. So I wanted to change all the zeroes to ones except for the first date for each team.
data gamedates4;
set gamedates2;
by team_name gamedate;
days=game_date-lag(game_date);
run;
Im going to try what you said, but I need to figure out how to not mess up the rest of it.
Looks like you need to include player_id in the sort and the by statement: by team date player_id;
then First.date refers to the first player's record within the date.
Also your Days calculation is probably incorrect as you are getting some days, the first days value for each team (except the first team) reflects the days between the last game of the previous team and the first of the current team.
days=game_date-lag(game_date);
probably should have a:
if first.team then days=.; /* or what ever makes sense in your application*/
Yea, I got rid of those by putting all negative values as zero.
Anyways, wouldn't the first.date still only capture the one players, I need to capture all the players with the first date for each team. I am starting to think I need to rethink this.
It will capture the first date and the retain will keep it at value until it changes again (another first.date) if you have BY variables in the proper sequence.
Okay, last question. This worked, but one thing I realize is that the lag function will create zeroes on the teams that have multiple players on the same date (IE the dates are the same, so after the first, the lag is zero.) How can I tell it to retain days until the next gamedate? IE I put the days=game_date-lag(game_date), then I need to retain that value until the gamedate changes. MY thought was use if first.game_date=0 then days=lag(days). but it doesnt work.
why not retain days and only calculate it when the date changes and reset the lag at that point.
Hi, i consume you are a tipster? please contact me on stajer@usa.com for bussines.
data need;
retain triger 0;
set original; by team date player_id;
if first.team = 1 then triger = 1;
if last.team = 1 then triger = 0;
run;
something like that?
if you could give bigger dataset example and the outcome you want to achieve it would be much more easy to understand and explane. im working on something similar but i dont get your problem..
If you're truly trying to track team dates, then something like this might be better:
proc sql;
create table team_dates as
select distinct team, game_date from have
order by team, game_date;
quit;
data want;
set team_dates;
format game_lag 8.;
by team;
if first.team then game_lag=0;
else game_lag=game_date - lag(game_date);
run;
should produce something like
| Team | Game_Date | game_lag | 
|---|---|---|
| A | 4/3/12 | 0 | 
| A | 4/4/12 | 1 | 
| A | 4/6/12 | 2 | 
| A | 5/1/12 | 24 | 
| B | 5/1/12 | 0 | 
| B | 5/5/12 | 4 | 
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
