Help using Base SAS procedures

Conditional Selection

Reply
Occasional Contributor
Posts: 12

Conditional Selection

Hello,

I am new to SAS, and I am looking for material that will enable me to code and do the following:

I am trying to select the observations in a data set that meet the following criteria:

Variables: SA, AD, RS, SP (there are other variables but these are what is needed
to do the selection).


When SA(i) equals SA(j) and AD(i) is not equal to AD(j)
Select Observation i if AD(i) is greater than AD(j)
Select Observation j if AD(i) is less than AD(j),

When SA(i) equals SA(j) and AD(i) equals AD(j)
Select Observation i if RS(i) is greater than RS(j)
Select Observation j if RS(i) is less than RS(j),

Ignore Observation i if SP(i) equal to Yd or Tl or Ng.


Thank you.
Super Contributor
Posts: 281

Re: Conditional Selection

> I am trying to select the observations in a data set
> that meet the following criteria:
>
> Variables: SA, AD, RS, SP (there are other variables
> but these are what is needed
> to do the selection).
>
>
> When SA(i) equals SA(j) and AD(i) is not equal
> to AD(j)

Not sure what this means ... normally, you don't select an observation based upon the value of the same variable in another observation. Do you mean select observation (i) when the value of SA equals the value of SA in some other observation (j), along with AD conditions?

If you really want to select an observation based upon a general condition where SA equals the value of SA in some other observation (plus AD conditions), where the other observation could be anywhere in the data set, I think you'd need PROC IML.

Am I understanding you properly? Message was edited by: Paige
Occasional Contributor
Posts: 12

Re: Conditional Selection

Thanks Paige,

What I mean is that if two observations have the same SA value, then select the one with the higher AD value. If it turns out that their AD values are also the same, then select the one with the higher RS value.

Any observation whose SP value is equal to Yd or Tl or Ng should not be selected.

All observations which have unique SA values will be selected.

(I used i and j to identify any two given observations. I hope the situation is clearer now).

EMKBB
Super Contributor
Posts: 281

Re: Conditional Selection

I still don't think I know enough to actually a the program to do this. Are you comparing all observations to each other? Or selected pairs (triples/quads/etc.) only? If so, how are you selecting the pairs?

But as a guess, does this meet your needs? Sort by SA AD RS. Now things are ordered such that choosing the one you want should be easy.
Occasional Contributor
Posts: 12

Re: Conditional Selection

Paige,

Not quite, you compare the observations with each other and select the ones that meet the above criteria. Those with unique SA values will definitely be selected.
Super Contributor
Super Contributor
Posts: 3,174

Re: Conditional Selection

Given the OP input, what needs to be looked-at are FIRST. and LAST. with "BY GROUP PROCESSING", and possibly using the LAG function, or using a RETAIN statement to capture/retain the prior observation variable/value for comparison to the current variable/value. Take the challenge in smaller components, dealing with each comparison task individually, validate your own developed code (based on available DOC and forum input), and build on the process with each additional comparison rqmt.

Scott Barry
SBBWorks, Inc.
Super Contributor
Posts: 281

Re: Conditional Selection

> Not quite, you compare the observations with each
> other and select the ones that meet the above
> criteria. Those with unique SA values will definitely
> be selected.

So please give an example, or explain more thoroughly, exactly what you are looking for. I specifically asked (and you didn't reply):

Are you comparing all observations to each other? Or selected pairs (triples/quads/etc.) only? If so, how are you selecting the pairs?
Occasional Contributor
Posts: 12

Re: Conditional Selection

Thanks sbb, I will look it up.
Respected Advisor
Posts: 3,908

Re: Conditional Selection

If you want to use SQL the look up "group by" with a "having" clause
Occasional Contributor
Posts: 12

Re: Conditional Selection

Thanks Patrick.
Occasional Contributor
Posts: 12

Re: Conditional Selection

Sorry Paige, if I was not explanatory enough, here we go:

In the table below, Observation 1 and Observation 5 have the same SA value, but the DA value of Observation 1 is greater than that of Observation 5 so in reading the data, Observation 1 will be selected but Observation 5 will be dropped.

Again Observation 2 and Observation 13 have the same SA values, and incidentally their AD values are also the same, so here we resort to their RS values (which are not the same). The RS value of Observation 13 is higher, so it will be selected and Observation 2 dropped.

Observations with unique SA values will be selected.

Observations 4, 7, 8, 12 will not be selected because they have the values Tl, Ng, Yd and Tl respectively.

Observation SA AD RS SP
1 15 20 19 Bt
2 18 21 22 Ab
3 16 18 16 Ab
4 10 12 15 Tl
5 15 19 17 Bt
6 20 22 24 Ef
7 13 16 17 Ng
8 26 27 29 Yd
9 22 23 26 Ef
10 24 26 28 Dd
11 23 25 27 Bt
12 27 29 30 Tl
13 18 21 23 Ef
14 21 22 25 Ab
15 12 14 16 Dd
Super Contributor
Posts: 281

Re: Conditional Selection

Okay, I understand now. What is it about my PROC SORT suggestion that doesn't work? Seems to me it puts your data in the proper order and makes it easy for you to select the ones you want.
Frequent Contributor
Posts: 106

Re: Conditional Selection

Paige is right. And if - for some reason you didn't specify - the original observation order needs to be preserved then add some kind of observation number up front, then proceed as Paige suggested and finally sort the remaining observations back.
Super User
Posts: 9,691

Re: Conditional Selection

I do it roughly, And i think there is some way can optimize it.
But I have to leave for the sake of working.


Yes.Paige has the right idea here.
try it.'nodupkey' can keep the first duplicate observation.

[pre]
data temp(where=( sp not in('Yd' 'TI' 'Ng')));
input Obs SA AD RS SP $;
datalines;
1 15 20 19 Bt
2 18 21 22 Ab
3 16 18 16 Ab
4 10 12 15 Tl
5 15 19 17 Bt
6 20 22 24 Ef
7 13 16 17 Ng
8 26 27 29 Yd
9 22 23 26 Ef
10 24 26 28 Dd
11 23 25 27 Bt
12 27 29 30 Tl
13 18 21 23 Ef
14 21 22 25 Ab
15 12 14 16 Dd
;
proc sort data=temp ;
by sa descending ad descending rs;
run;
proc sort data=temp nodupkey;
by sa;
run;
proc print noobs;
run;
[/pre]



Ksharp Message was edited by: Ksharp
Ask a Question
Discussion stats
  • 13 replies
  • 511 views
  • 0 likes
  • 6 in conversation