Solved
Contributor
Posts: 64

# subsetting data with a name range

Hi,

i want a way to check to see if any variables in my named range are negative. is there a more efficient way of accomplishing this ?

data have;

input   (x1-x3) (2.);

cards;

0 1 2

1 -2 3

3 4 5

-1 0 2

2 2 2

3 3 -3

;

run;

data want;

set have;

if x1<0 or

x2<0 or

x3<0;

run;

Accepted Solutions
Solution
‎05-10-2012 12:19 PM
Posts: 3,167

## Re: subsetting data with a name range

actually if using variable list, you can just do this:

data want;

set have;

if min(of x1-x3)<0 then output;

run;

Haikuo

All Replies
Posts: 3,167

## Re: subsetting data with a name range

Use array() to save some typing if you have many vars:

data have;

input x1-x3;

cards;

0 1 2

1 -2 3

3 4 5

-1 0 2

2 2 2

3 3 -3

;

run;

data want;

set have;

array x x1-x3;

do over x;

if x<0 then output;

end;

run;

proc print;run;

Regards,

Haikuo

Solution
‎05-10-2012 12:19 PM
Posts: 3,167

## Re: subsetting data with a name range

actually if using variable list, you can just do this:

data want;

set have;

if min(of x1-x3)<0 then output;

run;

Haikuo

Super Contributor
Posts: 350

## Re: subsetting data with a name range

Hi,

Try this..

data have2(drop=i);

input   (x1-x3) (2.);

array xx(3) x1-x3;

do i = 1 to dim(xx);

if xx(i)<0 then output;

end;

cards;

0 1 2

1 -2 3

3 4 5

-1 0 2

2 2 2

3 3 -3

;

run;

Thanks,

Shiva

Super User
Posts: 23,754

## Re: subsetting data with a name range

If you're outputting the whole row then check if the min is less than 0.

However this does require you to list your variables whereas in the array method you don't.

The where will process faster though.

data want;

set have;

where min(x1, x2, x3) <0;

run;

Super User
Posts: 6,781

## Re: subsetting data with a name range

It is likely to run faster if you switch from an IF to a WHERE statement.  (Whether you can notice the difference depends on characteristics of your data such as how many observations you are selecting, and how wide the observations are.)

If you know which variable(s) are most likely to be negative, you could switch the order around.  For example if X3 is usually the culprit, mention it first:

where x3 < 0 or x1 < 0 or x2 < 0;

Finally, notice that this code also selects missing values, not just negative numbers.

All of this is dependent on what you mean by "more efficient".  I've interpreted it to mean you want something that would run faster, but that's not the only possible definition.

Good luck.

Posts: 3,167

## Re: subsetting data with a name range

Astounding , as he/she always, just raised valid point on dealing with missing value. So just add: if .<min(of x1-x3)<0 then;

However, from my own experience, it is NOT always that 'where' is faster then 'if'. If you are dealing with a big data set with more than half of the obs meeting condition, 'if' could be significant faster than 'where'. I have noticed this in numerous counts in my work. That being said, direct access is not always faster than 'sequential access'.

Just my 2 cents,

Haikuo

Super User
Posts: 23,754

## Re: subsetting data with a name range

You can't use the OF in a WHERE statement, but you can in the IF statement.

Posts: 3,167

## Re: subsetting data with a name range

Didn't know that! Thanks! Right now, I mostly use 'where' in procs and index,  where you can't use 'if' or 'if' does not utilize index.

Regards,

Haikuo

Valued Guide
Posts: 765

## Re: subsetting data with a name range

hi ... since MIN ignores missing values (unless all are missing), I don't think that you have to do this ... if .<min(of x1-x3)<0

to cover situations when all are missing, you could try just adding a non-negative value to the function arguments ...

data have;

input x1-x3 @@;

cards;

. 1 2   1 -2 3   3 4 .   -1 0 2   2 2 2   3 3 -3   . . .

;

data negative;

set have;

if min(of x:, 0) lt 0;

run;

x1    x2    x3

1    -2     3

-1     0     2

3     3    -3

Contributor
Posts: 64