BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ayin
Quartz | Level 8

So I have two datasets:

/* dataset A */
          Col_1   Col_2   Col_3   Col_4
Row_1       3      2        5       7
Row_2       4      1        0       6
Row_3       11     8        3       1

/* dataset B */
           Col_1   Col_2   Col_3   Col_4
Row_1       10      6        1       3
Row_2       2       0        8       3
Row_3       1       2        9       4

The final dataset that I need would look like:

/* dataset output */
           Col_1   Col_2    Col_3   Col_4
Row_1       13       21      27       37
Row_2       6        7       15       24
Row_3       12       22      34       39

Bascially 1st sum up the same cells from dataset A and dataset B.

then calculate the cumulative sum starting from Col_1. 

 

e.g. the Col_3, Row_2 cell value in output dataset would be calculated as: (4+2) + (0+1) + (0+8) = 15;

the Col_2, Row_1 cell: (10+3)+(2+6) = 21.

 

It is quite easy to do in Excel, not sure how to do it in SAS? 

1 ACCEPTED SOLUTION

Accepted Solutions
art297
Opal | Level 21

I would use a datastep like the following:

 

data A;
  input Col_1-Col_4;
  cards;
3      2        5       7
4      1        0       6
11     8        3       1
;
data B;
  input Col_1-Col_4;
  cards;
10      6        1       3
2       0        8       3
1       2        9       4
;

data want (keep=col:);
  set a;
  set b (rename=(Col_1-Col_4=_Col_1-_Col_4));
  array cols(*) Col_1-Col_4;
  array _cols(*) _Col_1-_Col_4;
  do i=1 to dim(cols);
    if i=1 then cols(i)=sum(cols(i),_cols(i));
    else do;
      cols(i)=sum(cols(i-1),cols(i),_cols(i));
    end;
  end;
run;

Art, CEO, AnalystFinder.com

View solution in original post

9 REPLIES 9
Reeza
Super User

EG Tasks -> Append two datasets into 1, then use a Summary Task to sum the records across groups. 

 

 

art297
Opal | Level 21

I would use a datastep like the following:

 

data A;
  input Col_1-Col_4;
  cards;
3      2        5       7
4      1        0       6
11     8        3       1
;
data B;
  input Col_1-Col_4;
  cards;
10      6        1       3
2       0        8       3
1       2        9       4
;

data want (keep=col:);
  set a;
  set b (rename=(Col_1-Col_4=_Col_1-_Col_4));
  array cols(*) Col_1-Col_4;
  array _cols(*) _Col_1-_Col_4;
  do i=1 to dim(cols);
    if i=1 then cols(i)=sum(cols(i),_cols(i));
    else do;
      cols(i)=sum(cols(i-1),cols(i),_cols(i));
    end;
  end;
run;

Art, CEO, AnalystFinder.com

ayin
Quartz | Level 8
Hey thanks for the quick reply. Your codes worked.

What if the name starts with 0, so column names are like 'Col_0', 'Col_1' to 'Col_3'? I did a little bit adjustments and replace '1','4' with '0','3' in your codes, but then it says: "ERROR: Array subscript out of range at line 44 column 29."

I guess it has something to do with 0 but could you please tell me how to change it?
Reeza
Super User

It worked fine for me. 

 

data A;
  input Col_0-Col_3;
  cards;
3      2        5       7
4      1        0       6
11     8        3       1
;
data B;
  input Col_0-Col_3;
  cards;
10      6        1       3
2       0        8       3
1       2        9       4
;

data want (keep=col:);
  set a;
  set b (rename=(Col_0-Col_3=_Col_0-_Col_3));
  array cols(*) Col_0-Col_3;
  array _cols(*) _Col_0-_Col_3;
  do i=1 to dim(cols);
    if i=1 then cols(i)=sum(cols(i),_cols(i));
    else do;
      cols(i)=sum(cols(i-1),cols(i),_cols(i));
    end;
  end;
run;
art297
Opal | Level 21

My guess would be that in changing the number you also changed, but should'nt have:

 

do i=1 to dim(cols);

That, and the statements under it, shouldn't be changed.

 

Art, CEO, AnalystFinder.com

 

ayin
Quartz | Level 8
Hi, in addition to this problem, what if there are more than two input datasets? like three, four datasets. (and users will specify an input, a number, to determine how many datasets they will use)

Then how should we calculate the sum? (rather than hard-code). Suppose datasets have name like: Lib.&type._&class._&j._name; where 'type', 'class', 'j' are global macro variables.
ayin
Quartz | Level 8

Just figured out a solution:

%macro sum;
data want (keep=Class Array:);      
     retain Class Array:;
     array Array(*) Array_0-Array_%eval(&num.-1);
     do i=1 to dim(Array);
     Array(i)=0;            /* give the array initial values */
     end;

     %do _l=1 %to &Num_type;
     %let Type_name=%scan(&type,&_l);
        set lib.&type;
        array &Type_name.(*) &Type_name._0-&Type_name._%eval(&num.-1);    /* rename each column */
        do i=1 to dim(Array);
          Array(i)=&Type_name.(i) + Array(i);        /* sum up */
        end;
     %end;

run;
%mend sum;
%sum;

This process can be further expanded to cover more macro variables.

Reeza
Super User

Macros are one way, but I would suggest IML or Base SAS instead.

 

Here's a way, doesn't matter how many datasets or variables. 

 

data A;
	input Col_1-Col_4;
	cards;
3      2        5       7
4      1        0       6
11     8        3       1
;

data B;
	input Col_1-Col_4;
	cards;
10      6        1       3
2       0        8       3
1       2        9       4
;

data combined;
	set a b indsname=source;;

	*doesn't matter how many datasets here;
	dsn=source;
run;

proc sort data=combined;
	by dsn;
run;

data combined;
	set combined;
	by dsn;

	if first.dsn then
		row_count=1;
	else row_count+1;
run;

proc means data=combined nway noprint;
	class row_count;
	var col_1-col_4;

	*list all vars here, doesn't matter how many;
	output out=want (drop = _type_ _freq_) sum=;
run;
Tom
Super User Tom
Super User

I have no idea how you would do something like that in Excel.

But it is easy in SAS.

First combine all of the datasets and rotate into tall skinny format.  Note you could make this step as a view if you want.

data combine;
 set  B A indsname=source ;
 array c _numeric_;
 if source ne lag(source) then row=0;
 row+1;
 do col=1 to dim(c);
   value=c(col);
   output;
 end;
 keep row col value;
run;

Then calculate the sum of the values for the same ROW*COL cell;

proc summary nway ;
  class row col ;
  var value ;
  output out=sum (keep=row col value) sum=value ;
run;

Then generate the cummulative sums across the columns.

data cum_sum ;
  do until (last.row);
    set sum ;
    by row col;
    new_value = sum(new_value,value);
    output;
  end;
run;

Then if you want you could convert it back into a wide format.

proc transpose data=cum_sum out=want(drop=_: ) prefix=col_ ;
  by row ;
  id col ;
  var new_value ;
run;

 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 4499 views
  • 3 likes
  • 4 in conversation