Hi All,
In my input file where ever there is starting date for that account I need to find Average.
I'm able to apply flag for that first and last. The I'm not able to get idea how to find average for each account.
Please give me some hint to solve this issue.
{code}
data a2;
set a1;
by accnt;
if date ^= . then avg=1;
if last.accnt then avg=1;
run;
Please find data attached.Mea
Hi Pallis,
I hope by using RETAIN statement for date and accnt, we can achieve your requirement...
Hi,
I have tested this code, may be somebody might have a better idea......
data have;
Input accnt amnt date date9.;
datalines;
1 100
1 120
1 120
1 130 1-Jan-14
1 130
1 150
2 0
2 0
2 0
2 1000 2-Mar-14
2 1000
2 1000
3 200 2-Mar-14
3 200
3 230
3 230
3 300
4 500
4 500 2-Mar-14
4 500
4 550
;
data want(keep=accnt,amnt,date);
set have;
if accnt = _accnt and _date ne . then do;
date = _date;
end;
_accnt = accnt;
_date = date;
retain _accnt _date;
data want;
set want;
if date ne .;
proc means data=want;
var amnt;
class accnt;
run;
Result(Partial):
-------------
N
accnt Obs N Mean
-----------------------------------
1 3 3 136.6666667
2 3 3 1000.00
3 5 5 232.0000000
4 3 3 516.6666667
-----------------------------------
Hi:
I'm confused -- what is it that you need the Average of??? The Average date, the average balance, the average records/obs/occurences on each date? Having the data in an XLS file is lovely, but you attached 2 files with the same name and one file has a column called AVG and the other file does not have a column called AVG. The column for Date is not populated on every row and you have not adequately explained your logic. Is one of these files the INPUT file and another file your desired output? Your code is confusing. What do you mean by {code} and where is the {code} that creates work.A1? Why do you want to apply a flag to first.and last. BY variables. It also looks like you're setting the flag if the date is not equal to missing.
Why not use PROC MEANS -- the CLASS statement allows you to specify the category (such as date) and the VAR statement allows you to specify the numeric variable that you want to use for the MEAN calculation. Or, you could use PROC TABULATE or PROC REPORT. When I look at your data, however, it is not clear to me how you want your data to be processed before sending it to the procedure. It's also not clear whether your TEST_1.xls file is supposed to be work.A1 or work.A2.
You can take a look at the structure of SASHELP.PRDSALE to see what that data file looks like before I send that data to the procedures in my code. As you see, I can get reports with the MEAN statistic (among others) using several different methods. You should be able to run this code because SASHELP.PRDSALE should be available to you for use without needing a LIBNAME statement.
Cynthia
ods html file='c:\temp\show_average.html';
proc means data=sashelp.prdsale;
title '1a) PROC MEANS get average actual sales and other statistics';
class division;
var actual;
run;
proc means data=sashelp.prdsale;
title '1b) PROC MEANS get average actual sales and other statistics';
class division month;
var actual;
format month monyy5.;
run;
proc tabulate data=sashelp.prdsale;
title '2a) PROC TABULATE get average actual sales and some other statistics';
class division;
var actual;
table division all,
actual*(n mean);
run;
proc tabulate data=sashelp.prdsale;
title '2b) PROC TABULATE more complex table';
class division month;
var actual predict;
table division*month all,
actual*(n mean);
format month monyy5.;
run;
proc report data=sashelp.prdsale
style(summary)=Header;
title '3) PROC REPORT different method';
column division month actual,(n mean);
define division / group style(column)=Header;
define month / group f=monyy5. style(column)=Header;
define actual / analysis;
define n / 'Count' f=comma6.;
define mean / 'Average' f=comma12.2;
rbreak after / summarize;
break after division / summarize;
compute after division;
division = 'Sub-Total';
line ' ';
endcomp;
compute after;
division = 'Total';
call define('_c4_','format','dollar12.2');
endcomp;
run;ods _all_ close;
title;
ods html close;
From the two XLS files,I understand that Pallis need
average of amount group by account
but needs average of only those rows where the date is started in a row of an account to last row of that particular account.
Ravi,
data want(keep=accnt amnt date);
set have;
if accnt = _accnt and _date ne . then do;
date = _date;
end;
_accnt = accnt;
_date = date;
retain _accnt _date;
Run;
Can you explain me the logic please.
Yes sure
1. Read the input data
2. Copy the date to the remaining rows of same account using retain statement. So that the date will be copied to the below rows of same account.
3. Now remove the rows which are not having date( only upper rows to the row where date started.) Using if condition.
4. Now you will have only the desired rows, apply proc means ... You can get ur desired output.
Am I clear now?
to help design, there are a few clarifications.
1
are you sure that no ACCNT has more than one DATE ?
2
is it required that the result is all the rows (not only the lines with an average) ?
3
what is the significance of rows for an ACCNT before the DATE?
This is one of the report which I'm doing for my client. They are using this kind of data and finding average the way I defined.
@peter.c
1)Yes as you asked only one date for each account. This is the date where that account got activated.
2) Yes
3) That's business rule over there. My accnt's are pre activated but date is where the amount got credited into accounts.
@cynthia: I forgot to delete the file.I did that
I imported and created the input files as created Dataset as A1 and A2 is dataset where I created flag and the {code} I just created flag so that I can achieve the business logic. But I couldn't figure out how to proceed further.
Pallis
Your explanations helped clarify the data but
What's going on with accnt=4 in your test demo?
It has money before the date row and you don't include it in the average, for that reason I suppose.
Is that case an accident or are there some more rules?
Peter
data want;
set have;
by accnt;
if first.accnt then flag=0;
if not missing(date) then flag+1;
run;
proc sql;
select accnt,avg(amnt) as average from want
where flag=1 group by accnt;
quit;
If not missing(date) then flag+1.
I guess this line will setup flag = 1 only for those rows where the date is present.
I am not sure as I didn't test...
So for each account flag will be setup once where we can't get the desired mean...
Pls correct me if I m wrong
It will flag all observations starting from non missing date within each accnt.
Yes, you are right!!!
Your code is too simpler than mine.... 🙂
However now I learned that the expression like flag + 1 will contain automatic retain statement.
Thank you.
Hello,
Here is a solution which you may find useful:
data have;
infile datalines missover;
input accnt amnt date date9. ;
format date date9.;
datalines;
1 100
1 120
1 120
1 130 1-Jan-14
1 130
1 150
2 0
2 0
2 0
2 1000 2-Mar-14
2 1000
2 1000
3 200 2-Mar-14
3 200
3 230
3 230
3 300
4 500
4 500 2-Mar-14
4 500
4 550
;
proc sort data=want;
by accnt;
run;
data int1;
set have;
by accnt;
retain flag;
if first.accnt then flag=0;
if not(missing(date)) then flag=1;
if flag;
run;
proc means data=int1 noprint nway;
class accnt;
var amnt;
output out=int2 mean=average;
run;
data want(drop=average rename=(avgint=average));
merge have int2(drop= _type_ _freq_);
by accnt;
avgint=average;
if missing(date) then call missing(avgint);
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.