Good day.I am trying to create a summary of my data by using arrays. the objective is to sum my cost and budget variable for certain dates for every ID.My data looks like the following:
ID DATE COST BUDGET 01 01JAN02 50 900 01 01JAN02 20 800 01 01APR02 60 500 01 01JUL02 80 606 01 01JUL02 30 900 01 01JUL02 20 652 01 01OCT02 50 879 02 01JAN02 30 997 02 01JAN02 80 1000 ect
So I want to use an array to get the sum of the cost for each date and the sum of the budget for each date as a variable. I want one entry per ID.
eg. ID COST_01JAN02 COST_01APR02 COST_01JUN02 COST_01OCT02 BUDGET_01JAN02.........
This is the code i tried but it doesn't seem to give the desired result, any ideas on where I'm going wrong?
data one;
set original;
by ID ;
array now COST_01JAN02 COST_01APR02 COST_01JUL02
COST_01OCT02 BUDGET_01JAN02 BUDGET_01APR02
BUDGET_01JUL02 BUDGET_01OCT02;
if first.ID then DO OVER now;
now=.;
end;
if DATE='01JAN02'd then do ;
COST_01JAN02=sum(of COST); BUDGET_01JAN02=sum(of BUDGET); end;
if DATE='01APR02'd then do ;
COST_01APR02=sum(of COST); BUDGET_01APR02=sum(of BUDGET); end;
if DATE='01JUL02'd then do ;
COST_01JUL02=sum(of COST); BUDGET_01JUL02=sum(of BUDGET); end;
if DATE='01OCT02'd then do ;
COST_01OCT02=sum(of COST); BUDGET_01OCT02=sum(of BUDGET); end;
if last.B_CODE then output;
Keep ID COST_01JAN02 COST_01APR02 COST_01JUL02
COST_01OCT02 BUDGET_01JAN02 BUDGET_01APR02
BUDGET_01JUL02 BUDGET_01OCT02;
run;
The question is asked to test your knowledge. Its not a good way of working in any sense. You could for instance;
proc means data=have; by id date; var cost; output out=cost sum=sum; run; proc means data=have; by id date; var budget; output out=budget sum=sum; run; proc transpose data=cost out=cost_t prefix=cost; by id; var sum; id date; run; proc transpose data=budget out=budget_t prefix=budget; by id; var sum; id date; run; data want; merge cost_t budget_t; by id; run;
Assumes sorted by the way. This utilises transposing. You can do it via array with a bit more work:
http://support.sas.com/resources/papers/proceedings13/082-2013.pdf
Its called reshaping data. Personally the only time I would ever keep data in a transposed format (i.e. data going across the page) is for an output file. All programming work is many times simpler and more robust when data goes down the page (normalised).
Your data is in a perfect format for using proc summary and i recommend keeping the format, because wide formats, especially when variable names include date information, will cause headaches sooner or later, sooner in 12 of 13 cases.
The question is asked to test your knowledge. Its not a good way of working in any sense. You could for instance;
proc means data=have; by id date; var cost; output out=cost sum=sum; run; proc means data=have; by id date; var budget; output out=budget sum=sum; run; proc transpose data=cost out=cost_t prefix=cost; by id; var sum; id date; run; proc transpose data=budget out=budget_t prefix=budget; by id; var sum; id date; run; data want; merge cost_t budget_t; by id; run;
Assumes sorted by the way. This utilises transposing. You can do it via array with a bit more work:
http://support.sas.com/resources/papers/proceedings13/082-2013.pdf
Its called reshaping data. Personally the only time I would ever keep data in a transposed format (i.e. data going across the page) is for an output file. All programming work is many times simpler and more robust when data goes down the page (normalised).
I think arrays should start a #metoo campaign if you plan to really use this for analysis.
For pure didactical purpose you could do this:
data have;
input ID DATE $ COST BUDGET;
datalines;
01 01JAN02 50 900
01 01JAN02 20 800
01 01APR02 60 500
01 01JUL02 80 606
01 01JUL02 30 900
01 01JUL02 20 652
01 01OCT02 50 879
02 01JAN02 30 997
02 01JAN02 80 1000
;
run;
proc sort data=have (drop=id);
by date;
run;
proc transpose data=have out=h1 prefix=cost;
by date;
id cost;
run;
proc transpose data=have out=h2 prefix=budget;
by date;
id BUDGET;
run;
data h3;
length sum_cost sum_budget 8;
merge h1 h2 end=last;
by date;
sum_cost=.;
array costs cost:;
array budgets budget:;
dim_costs=dim(costs);
dim_budgets=dim(budgets);
if _N_ eq 1 then do;
call execute('data want; set h3;');
do i=1 to dim_costs;
if i eq 1 then call execute('sum_cost=sum(');
call execute(vname(costs(i)));
if i ne dim_costs then call execute(',');
else call execute(');');
end;
do j=1 to dim_budgets;
if j eq 1 then call execute('sum_budget=sum(');
call execute(vname(budgets(j)));
if j ne dim_budgets then call execute(',');
else call execute(');run;');
end;
end;
run;
- Cheers -
@Oligolas Not funny.
I think arrays should start a #metoo campaign if you plan to really use this for analysis.
Nice paper. Have seen a slight movement in this over the years. Notably in the pharma industry there is the CDISC guidelines which structures data in a normalised way, paramter/response type setup. Most databases also do it this. Unfortunately the prevalence of the Excel virus (much like the other MS wares like IE) in all walks of computing means that way of thinking is inbuilt at a young age from schools and universities through into mainstream activities. Take almost any major incident and likely there will be an Excel file somewhere in the background as the root of it all, sitting there all unvalidated, mashed together with pretty colors perfect for some suited management type to pivot.
@HelpPlease wrote:
so you suggest I do a proc summary? my assignment is to use arrays to get my desired results? any suggestions on how I can do this. Thank you for your reply
Then post ALL of the conditions and limitations imposed by instructions. We cannot tell from your initial post that you have something external, like "homework rules", that limit solutions.
And if you have an instructor that is forcing output to look like the junk in Excel, examine everything else in this class very carefully.
If I understand the output look you want you do not even need to do any manipulation of the data to get a person-readable report in that format.
proc tabulate data=have; class id; class date; format date date7.; var cost budget; tables id, date*(cost budget)*sum ; run;
for instance.
Almost everything that creates the "wide" totals you are requested require additional work for most other steps.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.