- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I have got some data like this:
Attr_F1 | Attr_F2 | Attr_F3 | obs_len | want |
1 | 2 | 3 | 3 | sum( of Attr_F1 - Attr_F3) |
1 | 2 | 3 | 2 | sum( of Attr_F1 - Attr_F2) |
I would like to get the sum() of first several columns according to the value of the fourth column(obs_len).
For example: if obs_len =2, then I would like to sum up Attr_F1 and Attr_F2, etc.
Could you help me on this? Hopefully a data step to solve it. Thanks a lot.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Desmond wrote:
Hi Thanks a lot for your reply.
However I got this error:
ERROR: Invalid DO loop control information, either the INITIAL or TO expression is missing or the BY expression is missing,
zero,
or invalid.
So, it appears that you have an observation with a missing value of OBS_LEN. You need to decide how to define the value of variable WANT in this case -- and also in other special cases not covered by your sample data.
For example, you could define:
- If OBS_LEN is missing or all values of ATTR_F[*] are missing, WANT should be missing as well.
- Otherwise, if OBS_LEN is zero or negative, WANT should be set to 0 ("empty sum").
- Missing values involved in the sum are treated as zeros, unless all summands are missing (in which case WANT is set to missing).
A more robust DATA step implementing the above rules could look like this:
data want;
set have;
array Attr_F[3];
if obs_len<=.z | ~n(of Attr_F[*]) then want=.;
else if obs_len<=0 then want=0;
else do _n_=1 to obs_len;
if Attr_F[_n_]>.z then want=sum(want,Attr_F[_n_]);
end;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello @Desmond and welcome to the SAS Support Communities!
Try this:
data want;
set have;
array Attr_F[3];
do _n_=1 to obs_len;
want=sum(want,Attr_F[_n_]);
end;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi Thanks a lot for your reply.
However I got this error:
ERROR: Invalid DO loop control information, either the INITIAL or TO expression is missing or the BY expression is missing,
zero,
or invalid.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The code DOES work, see this:
data have;
input Attr_F1 Attr_F2 Attr_F3 obs_len;
datalines;
1 2 3 3
1 2 3 2
;
run;
data want;
set have;
array Attr_F[3];
do _n_=1 to obs_len;
want=sum(want,Attr_F[_n_]);
end;
run;
proc print data=want noobs;
run;
Log:
24 data have; 25 input Attr_F1 Attr_F2 Attr_F3 obs_len; 26 datalines; NOTE: The data set WORK.HAVE has 2 observations and 4 variables. NOTE: DATA statement used (Total process time): real time 0.02 seconds cpu time 0.00 seconds 29 ; 30 run; 31 32 data want; 33 set have; 34 array Attr_F[3]; 35 do _n_=1 to obs_len; 36 want=sum(want,Attr_F[_n_]); 37 end; 38 run; NOTE: There were 2 observations read from the data set WORK.HAVE. NOTE: The data set WORK.WANT has 2 observations and 5 variables. NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.00 seconds 39 40 proc print data=want noobs; 41 run; NOTE: There were 2 observations read from the data set WORK.WANT. NOTE: The PROCEDURE PRINT printed page 1. 2 Das SAS System 08:14 Wednesday, April 24, 2019 NOTE: PROZEDUR PRINT used (Total process time): real time 0.02 seconds cpu time 0.00 seconds
Result:
Attr_F1 Attr_F2 Attr_F3 obs_len want 1 2 3 3 6 1 2 3 2 3
This once again illustrates the utmost importance of properly posting example data in a data step with datalines.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Desmond wrote:
Hi Thanks a lot for your reply.
However I got this error:
ERROR: Invalid DO loop control information, either the INITIAL or TO expression is missing or the BY expression is missing,
zero,
or invalid.
So, it appears that you have an observation with a missing value of OBS_LEN. You need to decide how to define the value of variable WANT in this case -- and also in other special cases not covered by your sample data.
For example, you could define:
- If OBS_LEN is missing or all values of ATTR_F[*] are missing, WANT should be missing as well.
- Otherwise, if OBS_LEN is zero or negative, WANT should be set to 0 ("empty sum").
- Missing values involved in the sum are treated as zeros, unless all summands are missing (in which case WANT is set to missing).
A more robust DATA step implementing the above rules could look like this:
data want;
set have;
array Attr_F[3];
if obs_len<=.z | ~n(of Attr_F[*]) then want=.;
else if obs_len<=0 then want=0;
else do _n_=1 to obs_len;
if Attr_F[_n_]>.z then want=sum(want,Attr_F[_n_]);
end;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi That's exactly the case.
Thanks a lot!! best!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data have;
input Attr_F1 Attr_F2 Attr_F3 obs_len;
datalines;
1 2 3 3
1 2 3 2
;
run;
data want;
set have;
array t(999);
array a(*) attr_:;
call pokelong(peekclong(addrlong(a(1)),obs_len*8),addrlong(t(1)),obs_len*8);
want=sum(of t(*));
drop t:;
run;