Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

☑ This topic is **solved**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 10-21-2021 12:40 AM
(448 views)

Hi,

I would like to create a new dataset from an old dataset, so that the new dataset will have cumulative sum of variables from the old dataset. This should be done by byvar variables in the old dataset. An old data example is like this temp dataset:

```
data temp;
input Factor1 $ Factor2 $ rank x y;
datalines;
A B 1 1 2
A B 2 2 4
A B 3 3 6
A C 1 1 7
A C 2 3 8
A C 3 5 9
;
run
```

The new dataset should be like this temp2 dataset:

```
data temp2;
input Factor1 $ Factor2 $ rank x y csum_x csum_y;
datalines;
A B 1 1 2 1 2
A B 2 2 4 3 6
A B 3 3 6 6 12
A C 1 1 7 1 7
A C 2 3 8 4 15
A C 3 5 9 9 24
;
run
```

The cumulative sum of x and y are done by Factor1 and Factor2 combination, and along with the rank variable rank. The general factor variable combination could be more than 2 factor variables. The old dataset temp could potentially has other variables not listed here. The final result should be a SAS dataset like temp2. Is there an IML way to do this? Thanks.

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I was hoping to give the OP a chance to think about this problem before providing the solution. But since KSharp wrote the program according to my outline, here is my version:

```
proc iml;
use temp;
read all var {Factor1 Factor2} into G; /* BY groups */
read all var {x y} into X[c=varNames]; /* the data for each BY group */
close;
/* if the input data is already sorted by Factors, you can find the
rows that separate the BY groups */
byRows = uniqueby(G, 1:ncol(G));
byRows = byRows // nrow(G); /* append the last rows for easy processing */
/* loop over the BY groups and compute the cumulative sums */
S = j(nrow(G), ncol(X), .); /* store the cumulative sums here */
do i = 1 to nrow(byRows)-1;
rowStart = byRows[i]; rowEnd = byRows[i+1];
do j = 1 to ncol(X);
S[rowStart:rowEnd, j] = cusum(X[rowStart:rowEnd, j]);
end;
end;
print G X[c=varNames] S[c=('c'+varNames)];
```

8 REPLIES 8

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Why do you want to do this in IML? Seems the right tool is the data step?

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

As @PeterClemmensen suggested, use data step, assuming TEMP is sorted by `Factor1, Factor2,`

and `rank`

```
data want;
set temp;
by Factor1 Factor2;
if first.Factor2 then call missing(cum_x, cum_y);
cum_x + x;
cum_y + y;
run;
```

Bart

_______________

**Polish SAS Users Group**: www.polsug.com and communities.sas.com/polsug

"**SAS Packages: the way to share**" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.

Hands-on-Workshop: "**Share your code with SAS Packages**"

"**My First SAS Package: A How-To**" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three

SAS Documentation

"

Hands-on-Workshop: "

"

SAS Ballot Ideas: one: SPF in SAS, two, and three

SAS Documentation

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

This appears to be the same question as

but the OP is requesting to do it in IML. Those who have non-IML solutions can post them to the other thread.

Sure, you can do this in IML if you want. I notice you've posted several similar questions to the IML community, so I assume you are trying to learn IML. I'll outline the general idea and give you a chance to experiment with the ideas and improve your IML programming skills.

1. Read the data into IML matrices. Read the factors into a BY-group matrix, **G**, and the variables into a data matrix, **X**.

2. Assuming that the input data are already sorted by the factors, use the UNIQUEBY function to find the rows of X that represent the first row for each BY group. As shown in the article, you can append nrow(G) to the vector to make subsequent processing easier.

3. Loop over the BY-group rows. Call the CUSUM function for each variable in X.

4. At the end of the loop over the BY-groups, you will have a matrix (call it **C**) that contains the cumulative sums of each variable for each BY-group. If you need that in a SAS data set, create a data set from **C** and merge it with the original data.

Try to program it yourself. If you get stuck, post what you have and others will help.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

```
data temp;
input Factor1 $ Factor2 $ rank x y;
datalines;
A B 1 1 2
A B 2 2 4
A B 3 3 6
A C 1 1 7
A C 2 3 8
A C 3 5 9
A D 1 1 7
A D 2 3 8
A D 3 5 9
;
run;
proc iml;
use temp nobs nobs;
read all var{Factor1 Factor2} into groups;
read all var{Factor1 Factor2 rank x y};
close;
start_idx=uniqueby(groups,(1:2));
end_idx=remove(start_idx,1)-1||nobs;
do i=1 to nrow(start_idx);
csum_x=csum_x//cusum(x[start_idx[i]:end_idx[i]]);
csum_y=csum_y//cusum(y[start_idx[i]:end_idx[i]]);
end;
create want var{Factor1 Factor2 rank x y csum_x csum_y};
append;
close;
quit;
```

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I was hoping to give the OP a chance to think about this problem before providing the solution. But since KSharp wrote the program according to my outline, here is my version:

```
proc iml;
use temp;
read all var {Factor1 Factor2} into G; /* BY groups */
read all var {x y} into X[c=varNames]; /* the data for each BY group */
close;
/* if the input data is already sorted by Factors, you can find the
rows that separate the BY groups */
byRows = uniqueby(G, 1:ncol(G));
byRows = byRows // nrow(G); /* append the last rows for easy processing */
/* loop over the BY groups and compute the cumulative sums */
S = j(nrow(G), ncol(X), .); /* store the cumulative sums here */
do i = 1 to nrow(byRows)-1;
rowStart = byRows[i]; rowEnd = byRows[i+1];
do j = 1 to ncol(X);
S[rowStart:rowEnd, j] = cusum(X[rowStart:rowEnd, j]);
end;
end;
print G X[c=varNames] S[c=('c'+varNames)];
```

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

"you overwrite the last element of cusum result for each level of the factor combination (this initially puzzled me a little bit). "

Me too. But Rick get right result .

Me too. But Rick get right result .

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Nice try, Ksharp. Another solution given in addition to your SQL solution.

Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

**If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. **

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.