turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-25-2017 03:31 AM

Hello,

I would like to calculate the sum of a variable for each level conditional of the value of a third (and fourth) variable. Without the condition everything is fine:

data somedata;

input id euro dummy1 dummy2;

cards;

1 12 0 1

1 23 1 1

1 56 1 1

1 23 1 0

2 22 0 1

2 24 1 1

2 34 1 1

2 10 1 0

3 19 0 1

3 28 1 1

3 56 1 1

3 21 1 0

4 21 0 1

4 34 1 1

4 32 1 1

4 43 1 0

;

run;

proc iml;

use work.somedata;

read all;

uqid = unique(id);

totalcost = j(1,ncol(uqid));

do i = 1 to ncol(uqid);

idx = loc(id = uqid[i]);

totalcost[i] = sum(euro[idx]);

end;

print totalcost;

quit;

Now I would like to add something like

if dummy1 = 1 then do;

or even

if ( dummy1 = 1 & dummy2 = 1 ) then do;

However, I cannot get it to run through. Adding

do i = 1 to ncol(uqid) while (dummy1 = 1);

gives the wrong results.

It looks like a simple idea to me, but I seem to be searching for the wrong keywords on the internet. Any pointers are greatly appreciated!

Thank you in advance.

Greetings from Germany.

Gerit

Accepted Solutions

Solution

07-25-2017
09:03 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to MsGeritO

07-25-2017 08:40 AM

I presume you have some combinations of id and dummy with no data. So you should only attempt to calculate the sum where there is at least one value. Adding something like:

`if ncol(idx)>0 then pkkst[i] = sum(euro[idx]); else pkkst[i]= . ;`

should work.

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to MsGeritO

07-25-2017 05:16 AM

You are adding the reference to your dummy variables in the wrong place. It needs to be within the LOC function, so you can pick out all the elements where the id is correct and the dummy variable is set. Try using syntax like this:

```
do i = 1 to ncol(uqid);
idx = loc( id = uqid[i] & dummy1 = 1 );
totalcost[i] = sum(euro[idx]);
end;
```

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to IanWakeling

07-25-2017 08:31 AM

Thank you Ian. It runs like a beauty on the sample set.

My original data set still puts up a fight though. Here "dummy1" is also numeric (as everything else). The data set is unsorted, has 68 columns, 198 individuals (id numbers) and 285,923 observations.

After the equivalent line of the summation (totalcost[i] = sum(euro[idx]); ) I get the error message: "(execution) Matrix has not been set to a value." I am guessing that "idx" has not been filled with values properly in the preceding line (idx = loc(id = uqid[i] & dummy1 = 1); )

The original code reads:

proc iml;

use work.gesamt;

read all ;

uqid = unique(id);

pkkst = j(1, ncol(uqid));

skkst = j(1, ncol(uqid));

do i = 1 to ncol(uqid) ;

idx = loc(id = uqid[i] & pk = 1);

pkkst[i] = sum(euro[idx]);

end;

do i = 1 to ncol(uqid) ;

idx = loc(id = uqid[i] & sk = 1);

skkst[i] = sum(euro[idx]);

end;

print pkkst skkst;

quit;

Any further ideas?

Thank you again.

Solution

07-25-2017
09:03 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to MsGeritO

07-25-2017 08:40 AM

I presume you have some combinations of id and dummy with no data. So you should only attempt to calculate the sum where there is at least one value. Adding something like:

`if ncol(idx)>0 then pkkst[i] = sum(euro[idx]); else pkkst[i]= . ;`

should work.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to IanWakeling

07-25-2017 09:05 AM

Fireworks!

Thank you very much.

Side effect: I need to check why there are these empty combinations. The real life behind it says there shouldn't be.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to MsGeritO

07-25-2017 08:42 AM

If the condition is not satisfied, IML will return an empty matrix, which is an invalid subscript. See the article "Beware the naked LOC."

The solution is to test the index before you use it:

```
if ncol(idx)>0 then
pkkst[i] = sum(euro[idx]);
else
pkkst[i] = .; /* or zero? */
```

By the way, this technique is called the "UNIQUE-LOC technique." If this technique takes too long because there are many unique categories, you might want to try an alternative technique known as the UNIQUEBY technique. See "An efficient alternative to the UNIQUE-LOC technique."

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Rick_SAS

07-25-2017 09:06 AM

Thank you for the pointer on the uniqueby. Now that I have a solution to the initiating issue I will look into more efficiency as it is already on the slow side and this is just the start of my calculations.