BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
stevenyan0127
Fluorite | Level 6

Hi, I have a dataset with 3 varibales, ID, Date, Cost in a long format. I want to generate a new variable called "Date_over_100" that shows the date when one's cumulative cost have exceeded 100. I would appreciate all the help there is. Thanks in advance!

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

For each ID, I think you want one observation, with variable ID and DATE, representing the first date in which the cumulative sum of costs exceeds 100.

 

If so, then:

 

data want;
  set a;
  by id;
  if first.id then sum_cost=0;
  sum_cost+cost;
  if (lag(sum_cost)<=100 or first.id=1) and sum_cost>100;
run;

The "subsetting if" outputs an observation only when the current sum_cost>100 and the observation-in-hand is either the first observation for a given ID, or is immediately preceded by an observation with sum_cost<=100.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

7 REPLIES 7
mkeintz
PROC Star

Help us help you.

 

Please provide a working DATA step with your sample data.

 

And please provide a well-defined table showing what you expect the output dataset to look like.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
stevenyan0127
Fluorite | Level 6
Hi, thanks for the reply. I have started a new thread with very detailed steps. https://communities.sas.com/t5/SAS-Programming/generating-new-variable-based-on-cumulative-sum/m-p/8... Would you mind taking a look? Thank you very much!
stevenyan0127
Fluorite | Level 6

Hi, I have a dataset with 3 varibales, ID, Date, Cost in a long format. I want to generate a new variable called "Date_over_100" that shows the date when one's cumulative cost have exceeded 100. I would appreciate all the help there is. Thanks in advance! I have achieved this in R, but need help to do so in SAS.

R: 

1. I created the dataset first. I have repeated the generated 10 IDs so there are 100 in total, and generated 10 repetitions of 10 dates for each ID, and generated values for cost by random sampling 100 numbers between 30 and 50. And the output would look like the attachment file.

 

set.seed(11)
ID <- c(rep(1:10,each=10))
Date <- rep(sample(seq(as.Date('2008/01/01'), as.Date('2008/01/11'), by="day",replace=T), 10),10)
Cost <- round(runif(100,30,50),digits = 2)
df <- data.frame(ID,Date,Cost)
df <- df%>%
arrange(Date)

2. I created a variable named "Date_Over_100" to show the date when one's cumulative cost has exceeded 100. Below are codes and output.

df_new <- df%>%
  group_by(ID) %>%
  summarize(Date_over_100 = Date[which.max(cumsum(Cost) > 100)])

Screen Shot 2022-04-09 at 12.06.10 AM.png

 

3. I then transformed the original dataset into wide format.

df_wide <- df%>%
  arrange(Date)%>%
  pivot_wider(names_from = Date, values_from = Cost)

Screen Shot 2022-04-09 at 12.07.03 AM.png

4. Lastly, I merged the wide format dataset with the dataset I got in 3.

df_final <- inner_join(df_new,df_wide,by="ID")

Screen Shot 2022-04-09 at 12.07.38 AM.png

 

Now in SAS:

1. Creating the dataset.

data a;
   call streaminit(123);
	do ID = 1 to 10;
	   do Date = '01Jan2008'd to '10Jan2008'd;
              Cost = rand("integer",30,50);
	      output;
	   end;
	end;
   format Date yymmddd10.;
run;

First 15 rows of the output:

Screen Shot 2022-04-08 at 11.54.37 PM.png

 

And next I need to create the variable "Date_over_100" that shows the date when one's cumulative cost has exceeded 100. 

I have tried to replicate what I have done in R into SAS but haven't succeeded. I would appreciate any help there is. Thanks in advance!

Kurt_Bremser
Super User

See this:

dara want;
set a;
by id;
retain flag;
if first.id
then do;
  sum_cost = cost;
  flag = 1;
end;
else sum_cost + cost; /* sum statement implies retain of sum_cost */
if sum_cost > 100
then do;
  if flag then output;
  flag = 0;
end;
drop flag;
run;

Untested, posted from my tablet.

 

PS I merged your questions, no need for a new thread.

stevenyan0127
Fluorite | Level 6
Thank you Kurt! It works perfect!
mkeintz
PROC Star

For each ID, I think you want one observation, with variable ID and DATE, representing the first date in which the cumulative sum of costs exceeds 100.

 

If so, then:

 

data want;
  set a;
  by id;
  if first.id then sum_cost=0;
  sum_cost+cost;
  if (lag(sum_cost)<=100 or first.id=1) and sum_cost>100;
run;

The "subsetting if" outputs an observation only when the current sum_cost>100 and the observation-in-hand is either the first observation for a given ID, or is immediately preceded by an observation with sum_cost<=100.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
stevenyan0127
Fluorite | Level 6
Thank you very much! It works perfect!

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 875 views
  • 0 likes
  • 3 in conversation