DATA Step, Macro, Functions and more

sum function

Reply
Frequent Contributor
Posts: 110

sum function

Hi SAS Users,

I have a question regarding how to set-up a sum function based on the dataset below. I think I need to use an array as well, but I am not certain. If you could offer some input, that'd be great.

My objective is to sum up the number of days the student is enrolled (aka "sum_enroll") using variables "enroll_start" to "enroll_end" to determine the start and end points for summing. I have included a description of the variables along with the desired dataset including the new variable "sum_enroll".

Thank you!

ID               = student

enroll_start  = day student was enrolled

enroll_end    = last day student was enrolled

day1-day10  = a binary variable (1, 0) indicated if they were in attendance on that day

sum_enroll   = sum of the number of days the student is enrolled

Note:  Variables enroll_start and enroll_end do not necessarily match up the with the first "1" and the last "1" value written for variables day1-day10.

data want;

input  ID   enroll_start enroll_end sum_enroll day1 day2 day3 day4 day5 day6 day7 day8 day9 day10;

         AA  day1          day10         10             1      1       1      1      1       1     1      1      1       1

         BB  day1          day 7            4             1      0       0      0      1       1     1      0      1       0

         CC  day4          day8             5             0      1       1      1      1      1      1       1      1      1

         DD   day5         day8             3             0      0       0      1      1       1     1       0      1      0

         EE   day3         day10            8            0      0       1      1      1       1     1      1       1      1

         FF   day1         day7              3             0      0       0      0      1       1      1      0       0      0

;;

Super Contributor
Posts: 578

Re: sum function

Posted in reply to sophia_SAS

Simplistic approach would be to subtract enroll_start from enroll_end and add 1 for the end date (presuming the values are actual dates)...

As an aside, I was confused by the value sum_enroll=4 for the second row.  Start day1 and end day7 means 7 days, right?

Frequent Contributor
Posts: 110

Re: sum function

Thanks for your response.  The objective isn't quite as simple as subtracting enroll_start from enroll_end and then + 1.  The 2nd row is an example of why.  Even though the enroll_start = 1 and enroll_end = 7, the answer is 4 as they were only in attendance on day 1, 4, 5 and 6.  I think PGstats has answered it though!

Respected Advisor
Posts: 4,936

Re: sum function

Posted in reply to sophia_SAS

data have;
input ID $ enroll_start $ enroll_end $ day1 - day10;
datalines;
AA day1 day10 1 1 1 1 1 1 1 1 1 1
BB day1 day7 1 0 0 0 1 1 1 0 1 0
CC day4 day8 0 1 1 1 1 1 1 1 1 1
DD day5 day8 0 0 0 1 1 1 1 0 1 0
EE day3 day10 0 0 1 1 1 1 1 1 1 1
FF day1 day7 0 0 0 0 1 1 1 0 0 0
;

data want;
array d day1-day10;
set have;
do _n_ = input(substr(enroll_start,4),3.) to input(substr(enroll_end,4),3.);
     sum_enroll = sum( sum_enroll, d{_n_});
     end;
run;

PG

PG
Frequent Contributor
Posts: 110

Re: sum function

PGStats,

So I believe your code produces the desired outcome.  One question, when I adapt your code for my own dataset, I receive an error message from SAS indicating that my do loop is invalid (i.e. "Invalid Do loop control information . . .). This makes sense as I do have some day1-day10 values that begin with zero or missing values.  What do you advise I do to your code to fix this error?


Thank you again for your help!

Respected Advisor
Posts: 4,936

Re: sum function

Posted in reply to sophia_SAS

I suspect the problem is with the data. On your second line of your data there is "day 7" with a space between day and 7. When you read that line, enroll_end is read as "day": it does not contain a number. You can either fix the data or count on the fact that there are always more than one space after enroll_start and enroll_end and change your input statement to:

input ID $ enroll_start &$ enroll_end &$ day1 - day10;

PG

PG
Super Contributor
Posts: 1,636

Re: sum function

Hi PG,

I think the modified code may work if what you suspect is true.

data bad_data;
input ID $ enroll_start $ enroll_end $ day1 - day10;
datalines;
AA day1 day10 1 1 1 1 1 1 1 1 1 1
BB day1 day7 1 0 0 0 1 1 1 0 1 0
CC day4 day8 0 1 1 1 1 1 1 1 1 1
DD day5 day11 0 0 0 1 1 1 1 0 1 0
EE day3 day10 0 0 1 1 1 1 1 1 1 1
FF day8 day7 0 0 0 0 1 1 1 0 0 0
;

data want problem;
array d day1-day10;
set bad_data;
start=input(compress(enroll_start,,'kd'),2.);
end=input(compress(enroll_end,,'kd'),2.);
if missing(start) or missing(end) or end <start or end >10 then output problem;
else do;
do _n_ = start  to  end;
     sum_enroll = sum( sum_enroll, d{_n_});
     end;
  output want;
end;
run;

Message was edited by: Linlin

Frequent Contributor
Posts: 110

Re: sum function

Thanks PG and Linlin for your responses.  My variable names are correct (i.e. no spaces between day and 7 - it was just typo in the sample datatset I provided.)

However, you are correct there is a problem with my data.  The problem relates to the enroll_start and enroll_end variables.  I think the issue is that sometimes my enroll_start and enroll_end may exceed the number of days I have in my dataset.

How do I specify in the below formula that I want (for example)  enroll_start  = enroll_day+1 or until enroll_start=day10. (Note: the +1 and +6 are just examples.)

enroll_start = enroll_day+1;

enroll_end = enroll_day+6;

So an example of that would look like this:

ID   enroll_day enroll_start enroll_end day1-day10

GG  day7 day8 day10    0 0 0 0 0 0 0 0 1 0

;;

Using my current formula, I would get enroll_end = day13 but I believe this is the problem with the code as I only have day1-day10.

Does this make sense?

Respected Advisor
Posts: 4,936

Re: sum function

Posted in reply to sophia_SAS

Yes it makes sense.  You can simply limit the range of the loop like this:

do _n_ = input(substr(enroll_start,4),3.) to min(input(substr(enroll_end,4),3.), hbound(d));

PG

PG
Respected Advisor
Posts: 4,936

Re: sum function

Posted in reply to sophia_SAS

Another option is to create more days :

data have;

infile datalines missover;

input ID $ enroll_start $ enroll_end $ day1 - day30;

datalines;

...

Extra days will remain missing and be treated as zeros in the statement

  sum_enroll = sum( sum_enroll, d{_n_});

 

PG

PG
Frequent Contributor
Posts: 110

Re: sum function

Thanks for your help.  I did figure it out. I used a combination of your code along with Linlin and I got it to work perfectly!

Super Contributor
Posts: 1,636

Re: sum function

Posted in reply to sophia_SAS

data want;

  set have;

  days_enrolled=input(compress(enroll_end,,'kd'),2.)-input(compress(enroll_start,,'kd'),2.)+1;

  array _days(*) day1-day10;

  do i=1 to dim(_days);

  days_attended=sum(days_attended,_days(i));

  end;

  drop i;

run;

Frequent Contributor
Posts: 110

Re: sum function

Thanks for your response, Linlin.  Your code is pretty close though not exactly what I was looking for.  For example in row 2, your code returns a value of 5 for variable 'days_attended' when actually the value should be 4.  As I only want to count the days of attendence (i.e. Day 1, 4, 5, 6) within the set range (i.e. enroll_start - enroll_end).  Does that make sense?

Ask a Question
Discussion stats
  • 12 replies
  • 441 views
  • 0 likes
  • 4 in conversation