DATA Step, Macro, Functions and more

How do I summarize a data set using Arrays?

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 12
Accepted Solution

How do I summarize a data set using Arrays?

Good day.I am trying to create a summary of my data by using arrays. the objective is to sum my cost and budget variable for certain dates for every ID.My data looks like the following:

ID DATE              COST    BUDGET 
01   01JAN02      50           900
01   01JAN02      20           800 
01   01APR02     60           500
01   01JUL02      80           606
01   01JUL02      30           900 
01   01JUL02      20           652 
01   01OCT02    50            879 
02   01JAN02     30            997 
02   01JAN02     80           1000 ect 

So I want to use an array to get the sum of the cost for each date and the sum of the budget for each date as a variable. I want one entry per ID.

eg. ID COST_01JAN02 COST_01APR02 COST_01JUN02 COST_01OCT02 BUDGET_01JAN02.........

This is the code i tried but it doesn't seem to give the desired result, any ideas on where I'm going wrong? 

data one; 
set original; 
by ID ; 
array now COST_01JAN02 COST_01APR02 COST_01JUL02 
                 COST_01OCT02 BUDGET_01JAN02 BUDGET_01APR02 
                 BUDGET_01JUL02 BUDGET_01OCT02; 
if first.ID then DO OVER now;
now=.;
end; 
if DATE='01JAN02'd then do ; 
COST_01JAN02=sum(of COST); BUDGET_01JAN02=sum(of BUDGET); end;
if DATE='01APR02'd then do ; 
COST_01APR02=sum(of COST); BUDGET_01APR02=sum(of BUDGET); end; 
if DATE='01JUL02'd then do ; 
COST_01JUL02=sum(of COST); BUDGET_01JUL02=sum(of BUDGET); end;
if DATE='01OCT02'd then do ; 
COST_01OCT02=sum(of COST); BUDGET_01OCT02=sum(of BUDGET); end; 
if last.B_CODE then output; 
Keep ID COST_01JAN02 COST_01APR02 COST_01JUL02 
         COST_01OCT02 BUDGET_01JAN02 BUDGET_01APR02 
          BUDGET_01JUL02 BUDGET_01OCT02; 
run;

 

 


Accepted Solutions
Solution
‎02-13-2018 11:02 AM
Super User
Super User
Posts: 9,200

Re: How do I summarize a data set using Arrays?

Posted in reply to HelpPlease

The question is asked to test your knowledge.  Its not a good way of working in any sense.  You could for instance;

proc means data=have;
  by id date;
  var cost;
  output out=cost sum=sum;
run;
proc means data=have;
  by id date;
  var budget;
  output out=budget sum=sum;
run;
proc transpose data=cost out=cost_t prefix=cost;
  by id;
  var sum;
  id date;
run;
proc transpose data=budget out=budget_t prefix=budget;
  by id;
  var sum;
  id date;
run;

data want;
  merge cost_t budget_t;
  by id;
run;

Assumes sorted by the way.  This utilises transposing.  You can do it via array with a bit more work:

http://support.sas.com/resources/papers/proceedings13/082-2013.pdf

Its called reshaping data.  Personally the only time I would ever keep data in a transposed format (i.e. data going across the page) is for an output file.  All programming work is many times simpler and more robust when data goes down the page (normalised).

View solution in original post


All Replies
Super Contributor
Posts: 498

Re: How do I summarize a data set using Arrays?

Posted in reply to HelpPlease

Your data is in a perfect format for using proc summary and i recommend keeping the format, because wide formats, especially when variable names include date information, will cause headaches sooner or later, sooner in 12 of 13 cases.

Occasional Contributor
Posts: 12

Re: How do I summarize a data set using Arrays?

Posted in reply to andreas_lds
so you suggest I do a proc summary? my assignment is to use arrays to get my desired results? any suggestions on how I can do this. Thank you for your reply
Solution
‎02-13-2018 11:02 AM
Super User
Super User
Posts: 9,200

Re: How do I summarize a data set using Arrays?

Posted in reply to HelpPlease

The question is asked to test your knowledge.  Its not a good way of working in any sense.  You could for instance;

proc means data=have;
  by id date;
  var cost;
  output out=cost sum=sum;
run;
proc means data=have;
  by id date;
  var budget;
  output out=budget sum=sum;
run;
proc transpose data=cost out=cost_t prefix=cost;
  by id;
  var sum;
  id date;
run;
proc transpose data=budget out=budget_t prefix=budget;
  by id;
  var sum;
  id date;
run;

data want;
  merge cost_t budget_t;
  by id;
run;

Assumes sorted by the way.  This utilises transposing.  You can do it via array with a bit more work:

http://support.sas.com/resources/papers/proceedings13/082-2013.pdf

Its called reshaping data.  Personally the only time I would ever keep data in a transposed format (i.e. data going across the page) is for an output file.  All programming work is many times simpler and more robust when data goes down the page (normalised).

Regular Contributor
Posts: 153

Re: How do I summarize a data set using Arrays?

I think arrays should start a #metoo campaign if you plan to really use this for analysis.

 

For pure didactical purpose you could do this:

data have;
input ID DATE $ COST BUDGET;
datalines;
01 01JAN02 50 900
01 01JAN02 20 800 
01 01APR02 60 500
01 01JUL02 80 606
01 01JUL02 30 900 
01 01JUL02 20 652 
01 01OCT02 50 879 
02 01JAN02 30 997 
02 01JAN02 80 1000
;
run;

proc sort data=have (drop=id);
   by date;
run;
proc transpose data=have out=h1 prefix=cost;
by date;
id cost;
run;
proc transpose data=have out=h2 prefix=budget;
by date;
id BUDGET;
run;

data h3;
   length sum_cost sum_budget 8;
   merge h1 h2 end=last;
   by date;
   sum_cost=.;
   array costs cost:;
   array budgets budget:;
   dim_costs=dim(costs);
   dim_budgets=dim(budgets);
   if _N_ eq 1 then do;
      call execute('data want; set h3;'); 
      do i=1 to dim_costs;
         if i eq 1 then call execute('sum_cost=sum(');
         call execute(vname(costs(i)));
         if i ne dim_costs then call execute(',');
         else call execute(');');
      end;

      do j=1 to dim_budgets;
         if j eq 1 then call execute('sum_budget=sum(');
         call execute(vname(budgets(j)));
         if j ne dim_budgets then call execute(',');
         else call execute(');run;');
      end;
   end;  
run;
________________________

- Cheers -

Super User
Posts: 22,843

Re: How do I summarize a data set using Arrays?

@Oligolas Not funny.

 

I think arrays should start a #metoo campaign if you plan to really use this for analysis.

Occasional Contributor
Posts: 12

Re: How do I summarize a data set using Arrays?

haha thanks anyway!!
Occasional Contributor
Posts: 12

Re: How do I summarize a data set using Arrays?

Thank you!
Super User
Posts: 22,843

Re: How do I summarize a data set using Arrays?


RW9 wrote:

  All programming work is many times simpler and more robust when data goes down the page (normalised).


@RW9 R has conceptualized this under the concept of 'tidy data'. He wrote a good paper on it, attached.

Super User
Super User
Posts: 9,200

Re: How do I summarize a data set using Arrays?

Nice paper.  Have seen a slight movement in this over the years.  Notably in the pharma industry there is the CDISC guidelines which structures data in a normalised way, paramter/response type setup.  Most databases also do it this.  Unfortunately the prevalence of the Excel virus (much like the other MS wares like IE) in all walks of computing means that way of thinking is inbuilt at a young age from schools and universities through into mainstream activities.  Take almost any major incident and likely there will be an Excel file somewhere in the background as the root of it all, sitting there all unvalidated, mashed together with pretty colors perfect for some suited management type to pivot.  

Super User
Posts: 13,023

Re: How do I summarize a data set using Arrays?

Posted in reply to HelpPlease

HelpPlease wrote:
so you suggest I do a proc summary? my assignment is to use arrays to get my desired results? any suggestions on how I can do this. Thank you for your reply

Then post ALL of the conditions and limitations imposed by instructions. We cannot tell from your initial post that you have something external, like "homework rules", that limit solutions.

 

And if you have an instructor that is forcing output to look like the junk in Excel, examine everything else in this class very carefully.

 

If I understand the output look you want you do not even need to do any manipulation of the data to get a person-readable report in that format.

proc tabulate data=have;
   class id;
   class date;
   format date date7.;
   var cost budget;
   tables id,
          date*(cost budget)*sum
   ;
run;

for instance.

 

Almost everything that creates the "wide" totals you are requested require additional work for most other steps.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 10 replies
  • 250 views
  • 3 likes
  • 6 in conversation