- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi all,
I am trying to sum all values of some variables.
Variables acd,bcd,.......ocd have either a value of 1,0, or missing
So in order to get the total, I used the sum function.
data total;
set start;
count= sum(acd, bcd, ccd, dcd, ecd, fcd, gcd, hcd, icd, jcd, kcd, lcd, mcd, ncd, ocd);
run;
But I get an error message on the log saying
Missing values were generated as a result of performing an operation on missing values
What is wrong with my sum statement?
thanks
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
sure try the subsetting if before you sum:
if cmiss(of a--c) eq 0
data have;
input a b c;
datalines;
1 2 3
4 . 6
. . .
;
data want;
set have;
if cmiss(of a--c) eq 0;
sum=sum(a,b,c);
run;
And if you want to keep the records with missing values in your output, then-->
data want;
set have;
if cmiss(of a--c) eq 0 then sum=sum(a,b,c);
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
That's not an error. That's merely a NOTE in the log. This happens when all of those variables specified as arguments to sum function have missing values. Take a look below
data have;
input a b c;
datalines;
1 2 3
4 . 6
. . .
;
data want;
set have;
sum=sum(a,b,c);
run;
Try running the second time by removing the last dataline in HAVE. You'll understand
And if you do not want such notes, set nonotes system option
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
sure try the subsetting if before you sum:
if cmiss(of a--c) eq 0
data have;
input a b c;
datalines;
1 2 3
4 . 6
. . .
;
data want;
set have;
if cmiss(of a--c) eq 0;
sum=sum(a,b,c);
run;
And if you want to keep the records with missing values in your output, then-->
data want;
set have;
if cmiss(of a--c) eq 0 then sum=sum(a,b,c);
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
One slight(?) correction to @novinosrin's suggested code. cmiss counts the number of missing values so what you'd want is:
data want; set have; if cmiss(acd, bcd, ccd, dcd, ecd, fcd, gcd, hcd, icd, jcd, kcd, lcd, mcd, ncd, ocd) ne 15 then sum=sum(acd, bcd, ccd, dcd, ecd, fcd, gcd, hcd, icd, jcd, kcd, lcd, mcd, ncd, ocd); run;
However, if your variables are in the same order as shown above, you could use the same type of variable list used in that suggested code, namely:
data want; set have; if cmiss(of acd--ocd) ne 3 then sum=sum(of acd--ocd); run;
Art, CEO, AnalystFinder.com
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Of course you could also use the opposite of nmiss, namely:
data want; set have; if n(of acd--ocd) ne 0 then sum=sum(of acd--ocd); run;
Art, CEO, AnalystFinder.com
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
A few things to consider ...
First, the DATA step sums values within a single observation. Is that what you want, or do you want a sum each variable separately, across all observations?
Second, SUM returns a missing value when all the incoming values are missing. That's what gets you the note in the log. You could add one more argument to the SUM function, adding 0 to the list of values being summed.
Finally, if you want each variable summed individually across all observations, it is easy to use PROC MEANS:
proc means data=have;
var acd bcd ccd dcd ecd fcd gcd hcd icd jcd kcd lcd mcd ncd ocd;
output out=save_in_a_data_set_also sum=;
run;
proc print data=save_in_a_data_set_also;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
So the data have
ID acd bcd ccd ...ocd total_ct
1 1 . 1 0 2
2 . . 0 1 1
So I want to create total_ct variable but I have some rows with all variable values missing
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Do you want to output the records that only have missing values?
Art, CEO, AnalystFinder.com
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@leahcho wrote:
I am trying to sum all variables per each ID and create a variable for that
So the data have
ID acd bcd ccd ...ocd total_ct
1 1 . 1 0 2
2 . . 0 1 1
So I want to create total_ct variable but I have some rows with all variable values missing
So do want a special value for the total when all are missing? Not to output the row with a missing total? Something else?
So far there is no issue, that is what the function does. If you need a specific result for the missing then tell us.
Or use
if missing (total_ct) then total_ct = <what ever numeric value you want>. Or delve in the joys of special missing values.