- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I have the following data
code price share
------------------
AMGN 67.66 100
DELL 24.60 200
GE 34.50 100
HPQ 32.32 120
IBM 82.25 50
MOT 30.24 100
I need to calculate the average of the price and share in data step without using procedures
AMGN 67.66 100
DELL 24.60 200
GE 34.50 100
HPQ 32.32 120
IBM 82.25 50
MOT 30.24 100
45.26 111.6
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Is this another "interview" question of which we are seeing several recently? If so its ridiculous, perhaps you should not use your fingers to type in, or use words to communicate.
As for datastep:
data want; set have end=last; retain cnt tot_price tot_share; cnt=sum(cnt,1); tot_price=sum(tot_price,price); tot_share=sum(tot_share,share); if last then do; mean_price=tot_price/cnt; mean_share=tot_share/cnt; output; end; run;
However, maybe you would be better not using SAS at all, get yourself an assembler and do it that way?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Like this? 🙂
data have;
input code $ price share;
infile datalines;
datalines;
AMGN 67.66 100
DELL 24.60 200
GE 34.50 100
HPQ 32.32 120
IBM 82.25 50
MOT 30.24 100
;
data want;
set have end = eof;
cumprice = cumprice + price;
cumshare = cumshare + share;
if eof = 1 then do;
meanprice = cumprice / _N_;
meanshare = cumshare / _N_;
end;
retain cumprice cumshare 0;
drop cumprice cumshare;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data _NULL_;
set have end=eof;
retain tot1 tot2 count;
tot1+price;
tot2+share;
count+;
if eof then do;
avg_price = tot1/count;
avg_share = tot2/count;
put avg_price= avg_share=;
end;
format avg_price avg_share 5.2;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If this is an interview question, I would tell the person that the code should handle missing values. That would separate the wheat from the chaff.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Calculating the mean with and without missing values
You can cut and paste the R code into IML/R or
you can just use IML.
HAVE
====
Up to 40 obs from sd1.have total obs=6
Obs PRICE SHARE
1 1 1
2 2 3
3 3 .
4 3 .
5 2 1
6 1 3
WANT
====
AVG_PRICE=2.00
AVG_SHARE=2.00
BUT I GET
=========
AVG_PRICE=2.00
AVG_SHARE=.
WORKING CODE FOR ADJUSTING FOR MISSINGS
=======================================
R
colMeans(have, na.rm=TRUE); (adjust for missing)
colMeans(have, na.rm=FALSE); (do not adjust for missing)
SAS/WPS
tot2=sum(tot2,share);
cnt2=ifn(share ne .,cnt2+1,cnt2);
FULL SOLUTION
=============
* create some data;
options validvarname=upcase;
libname sd1 "d:/sd1";
data sd1.have(drop=code);
input price share;
cards4;
1 1
2 3
3 .
3 .
2 1
1 3
;;;;
run;quit;
* DO NOT ADJUST FOR MISSING;
data want;
set sd1.have end = eof;
cumprice = cumprice + price;
cumshare = cumshare + share;
if eof = 1 then do;
meanprice = cumprice / _N_;
meanshare = cumshare / _N_;
end;
retain cumprice cumshare 0;
drop cumprice cumshare;
run;
* ADJUST FOR MISSING;
%utl_submit_wps64('
libname sd1 "d:/sd1";
data _NULL_;
set sd1.have end=eof;
retain tot1 tot2 cnt1 cnt2 0;
tot1=sum(tot1,price);
tot2=sum(tot2,share);
cnt1=ifn(price ne .,cnt1+1,cnt1);
cnt2=ifn(share ne .,cnt2+1,cnt2);
if eof then do;
avg_price = tot1/cnt1;
avg_share = tot2/cnt2;
put avg_price= avg_share=;
end;
format avg_price avg_share 5.2;
run;
');
* R SOLUTION;
%utl_submit_wps64('
options set=R_HOME "C:/Program Files/R/R-3.3.1";
proc r;
submit;
library(haven);
have=read_sas("d:/sd1/have.sas7bdat");
have;
colMeans(have, na.rm=TRUE);
colMeans(have, na.rm=FALSE);
endsubmit;
');
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If I didn't want the job very badly I'd give them something like this...
%macro get_means(data=,var=, class=, classval=); proc means data=&data.; %if not(%sysevalf(%superq(class)=,boolean)) %then %do; class &class.; ways 1; %end; var &var.; output out=_temp_means mean(&var.)=; run; data _null_; set _temp_means; %if not(%sysevalf(%superq(class)=,boolean)) %then %do; where &class. = &classval.; %end; call symputx("mean_&var.",&var.,'g'); run; proc delete data=_temp_means; run; %mend get_means; data want; set sashelp.class; rc = dosubl('%get_means(data=sashelp.class,var=height,class=sex,classval="'||sex||'")'); height_mean = symget('mean_height'); run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@rogerjdeangelis's code is close, but doesn't handle the boundary case.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Good Point Rosk
Missings are the bane of programming. However my R code does cover this case.
I suspect this is the case and I need to chck for 0/0 in the SAS code.
# A tibble: 6 × 2
PRICE SHARE
<dbl> <dbl>
1 1 NA
2 2 NA
3 3 NA
4 3 NA
5 2 NA
6 1 NA
PRICE SHARE
2 NaN ==> 0/0
PRICE SHARE
2 NA