Hi, I have a dataset with 3 varibales, ID, Date, Cost in a long format. I want to generate a new variable called "Date_over_100" that shows the date when one's cumulative cost have exceeded 100. I would appreciate all the help there is. Thanks in advance!
For each ID, I think you want one observation, with variable ID and DATE, representing the first date in which the cumulative sum of costs exceeds 100.
If so, then:
data want;
set a;
by id;
if first.id then sum_cost=0;
sum_cost+cost;
if (lag(sum_cost)<=100 or first.id=1) and sum_cost>100;
run;
The "subsetting if" outputs an observation only when the current sum_cost>100 and the observation-in-hand is either the first observation for a given ID, or is immediately preceded by an observation with sum_cost<=100.
Help us help you.
Please provide a working DATA step with your sample data.
And please provide a well-defined table showing what you expect the output dataset to look like.
Hi, I have a dataset with 3 varibales, ID, Date, Cost in a long format. I want to generate a new variable called "Date_over_100" that shows the date when one's cumulative cost have exceeded 100. I would appreciate all the help there is. Thanks in advance! I have achieved this in R, but need help to do so in SAS.
R:
1. I created the dataset first. I have repeated the generated 10 IDs so there are 100 in total, and generated 10 repetitions of 10 dates for each ID, and generated values for cost by random sampling 100 numbers between 30 and 50. And the output would look like the attachment file.
set.seed(11)
ID <- c(rep(1:10,each=10))
Date <- rep(sample(seq(as.Date('2008/01/01'), as.Date('2008/01/11'), by="day",replace=T), 10),10)
Cost <- round(runif(100,30,50),digits = 2)
df <- data.frame(ID,Date,Cost)
df <- df%>%
arrange(Date)
2. I created a variable named "Date_Over_100" to show the date when one's cumulative cost has exceeded 100. Below are codes and output.
df_new <- df%>% group_by(ID) %>% summarize(Date_over_100 = Date[which.max(cumsum(Cost) > 100)])
3. I then transformed the original dataset into wide format.
df_wide <- df%>% arrange(Date)%>% pivot_wider(names_from = Date, values_from = Cost)
4. Lastly, I merged the wide format dataset with the dataset I got in 3.
df_final <- inner_join(df_new,df_wide,by="ID")
Now in SAS:
1. Creating the dataset.
data a;
call streaminit(123);
do ID = 1 to 10;
do Date = '01Jan2008'd to '10Jan2008'd;
Cost = rand("integer",30,50);
output;
end;
end;
format Date yymmddd10.;
run;
First 15 rows of the output:
And next I need to create the variable "Date_over_100" that shows the date when one's cumulative cost has exceeded 100.
I have tried to replicate what I have done in R into SAS but haven't succeeded. I would appreciate any help there is. Thanks in advance!
See this:
dara want;
set a;
by id;
retain flag;
if first.id
then do;
sum_cost = cost;
flag = 1;
end;
else sum_cost + cost; /* sum statement implies retain of sum_cost */
if sum_cost > 100
then do;
if flag then output;
flag = 0;
end;
drop flag;
run;
Untested, posted from my tablet.
PS I merged your questions, no need for a new thread.
For each ID, I think you want one observation, with variable ID and DATE, representing the first date in which the cumulative sum of costs exceeds 100.
If so, then:
data want;
set a;
by id;
if first.id then sum_cost=0;
sum_cost+cost;
if (lag(sum_cost)<=100 or first.id=1) and sum_cost>100;
run;
The "subsetting if" outputs an observation only when the current sum_cost>100 and the observation-in-hand is either the first observation for a given ID, or is immediately preceded by an observation with sum_cost<=100.
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.