- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Posted 10-11-2010 11:17 AM
(4070 views)
Hello,
I am new to SAS, and I am looking for material that will enable me to code and do the following:
I am trying to select the observations in a data set that meet the following criteria:
Variables: SA, AD, RS, SP (there are other variables but these are what is needed
to do the selection).
When SA(i) equals SA(j) and AD(i) is not equal to AD(j)
Select Observation i if AD(i) is greater than AD(j)
Select Observation j if AD(i) is less than AD(j),
When SA(i) equals SA(j) and AD(i) equals AD(j)
Select Observation i if RS(i) is greater than RS(j)
Select Observation j if RS(i) is less than RS(j),
Ignore Observation i if SP(i) equal to Yd or Tl or Ng.
Thank you.
I am new to SAS, and I am looking for material that will enable me to code and do the following:
I am trying to select the observations in a data set that meet the following criteria:
Variables: SA, AD, RS, SP (there are other variables but these are what is needed
to do the selection).
When SA(i) equals SA(j) and AD(i) is not equal to AD(j)
Select Observation i if AD(i) is greater than AD(j)
Select Observation j if AD(i) is less than AD(j),
When SA(i) equals SA(j) and AD(i) equals AD(j)
Select Observation i if RS(i) is greater than RS(j)
Select Observation j if RS(i) is less than RS(j),
Ignore Observation i if SP(i) equal to Yd or Tl or Ng.
Thank you.
13 REPLIES 13
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
> I am trying to select the observations in a data set
> that meet the following criteria:
>
> Variables: SA, AD, RS, SP (there are other variables
> but these are what is needed
> to do the selection).
>
>
> When SA(i) equals SA(j) and AD(i) is not equal
> to AD(j)
Not sure what this means ... normally, you don't select an observation based upon the value of the same variable in another observation. Do you mean select observation (i) when the value of SA equals the value of SA in some other observation (j), along with AD conditions?
If you really want to select an observation based upon a general condition where SA equals the value of SA in some other observation (plus AD conditions), where the other observation could be anywhere in the data set, I think you'd need PROC IML.
Am I understanding you properly? Message was edited by: Paige
> that meet the following criteria:
>
> Variables: SA, AD, RS, SP (there are other variables
> but these are what is needed
> to do the selection).
>
>
> When SA(i) equals SA(j) and AD(i) is not equal
> to AD(j)
Not sure what this means ... normally, you don't select an observation based upon the value of the same variable in another observation. Do you mean select observation (i) when the value of SA equals the value of SA in some other observation (j), along with AD conditions?
If you really want to select an observation based upon a general condition where SA equals the value of SA in some other observation (plus AD conditions), where the other observation could be anywhere in the data set, I think you'd need PROC IML.
Am I understanding you properly? Message was edited by: Paige
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Paige,
What I mean is that if two observations have the same SA value, then select the one with the higher AD value. If it turns out that their AD values are also the same, then select the one with the higher RS value.
Any observation whose SP value is equal to Yd or Tl or Ng should not be selected.
All observations which have unique SA values will be selected.
(I used i and j to identify any two given observations. I hope the situation is clearer now).
EMKBB
What I mean is that if two observations have the same SA value, then select the one with the higher AD value. If it turns out that their AD values are also the same, then select the one with the higher RS value.
Any observation whose SP value is equal to Yd or Tl or Ng should not be selected.
All observations which have unique SA values will be selected.
(I used i and j to identify any two given observations. I hope the situation is clearer now).
EMKBB
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I still don't think I know enough to actually a the program to do this. Are you comparing all observations to each other? Or selected pairs (triples/quads/etc.) only? If so, how are you selecting the pairs?
But as a guess, does this meet your needs? Sort by SA AD RS. Now things are ordered such that choosing the one you want should be easy.
But as a guess, does this meet your needs? Sort by SA AD RS. Now things are ordered such that choosing the one you want should be easy.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Paige,
Not quite, you compare the observations with each other and select the ones that meet the above criteria. Those with unique SA values will definitely be selected.
Not quite, you compare the observations with each other and select the ones that meet the above criteria. Those with unique SA values will definitely be selected.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Given the OP input, what needs to be looked-at are FIRST. and LAST. with "BY GROUP PROCESSING", and possibly using the LAG function, or using a RETAIN statement to capture/retain the prior observation variable/value for comparison to the current variable/value. Take the challenge in smaller components, dealing with each comparison task individually, validate your own developed code (based on available DOC and forum input), and build on the process with each additional comparison rqmt.
Scott Barry
SBBWorks, Inc.
Scott Barry
SBBWorks, Inc.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
> Not quite, you compare the observations with each
> other and select the ones that meet the above
> criteria. Those with unique SA values will definitely
> be selected.
So please give an example, or explain more thoroughly, exactly what you are looking for. I specifically asked (and you didn't reply):
Are you comparing all observations to each other? Or selected pairs (triples/quads/etc.) only? If so, how are you selecting the pairs?
> other and select the ones that meet the above
> criteria. Those with unique SA values will definitely
> be selected.
So please give an example, or explain more thoroughly, exactly what you are looking for. I specifically asked (and you didn't reply):
Are you comparing all observations to each other? Or selected pairs (triples/quads/etc.) only? If so, how are you selecting the pairs?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks sbb, I will look it up.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If you want to use SQL the look up "group by" with a "having" clause
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Patrick.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Sorry Paige, if I was not explanatory enough, here we go:
In the table below, Observation 1 and Observation 5 have the same SA value, but the DA value of Observation 1 is greater than that of Observation 5 so in reading the data, Observation 1 will be selected but Observation 5 will be dropped.
Again Observation 2 and Observation 13 have the same SA values, and incidentally their AD values are also the same, so here we resort to their RS values (which are not the same). The RS value of Observation 13 is higher, so it will be selected and Observation 2 dropped.
Observations with unique SA values will be selected.
Observations 4, 7, 8, 12 will not be selected because they have the values Tl, Ng, Yd and Tl respectively.
Observation SA AD RS SP
1 15 20 19 Bt
2 18 21 22 Ab
3 16 18 16 Ab
4 10 12 15 Tl
5 15 19 17 Bt
6 20 22 24 Ef
7 13 16 17 Ng
8 26 27 29 Yd
9 22 23 26 Ef
10 24 26 28 Dd
11 23 25 27 Bt
12 27 29 30 Tl
13 18 21 23 Ef
14 21 22 25 Ab
15 12 14 16 Dd
In the table below, Observation 1 and Observation 5 have the same SA value, but the DA value of Observation 1 is greater than that of Observation 5 so in reading the data, Observation 1 will be selected but Observation 5 will be dropped.
Again Observation 2 and Observation 13 have the same SA values, and incidentally their AD values are also the same, so here we resort to their RS values (which are not the same). The RS value of Observation 13 is higher, so it will be selected and Observation 2 dropped.
Observations with unique SA values will be selected.
Observations 4, 7, 8, 12 will not be selected because they have the values Tl, Ng, Yd and Tl respectively.
Observation SA AD RS SP
1 15 20 19 Bt
2 18 21 22 Ab
3 16 18 16 Ab
4 10 12 15 Tl
5 15 19 17 Bt
6 20 22 24 Ef
7 13 16 17 Ng
8 26 27 29 Yd
9 22 23 26 Ef
10 24 26 28 Dd
11 23 25 27 Bt
12 27 29 30 Tl
13 18 21 23 Ef
14 21 22 25 Ab
15 12 14 16 Dd
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Okay, I understand now. What is it about my PROC SORT suggestion that doesn't work? Seems to me it puts your data in the proper order and makes it easy for you to select the ones you want.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Paige is right. And if - for some reason you didn't specify - the original observation order needs to be preserved then add some kind of observation number up front, then proceed as Paige suggested and finally sort the remaining observations back.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I do it roughly, And i think there is some way can optimize it.
But I have to leave for the sake of working.
Yes.Paige has the right idea here.
try it.'nodupkey' can keep the first duplicate observation.
[pre]
data temp(where=( sp not in('Yd' 'TI' 'Ng')));
input Obs SA AD RS SP $;
datalines;
1 15 20 19 Bt
2 18 21 22 Ab
3 16 18 16 Ab
4 10 12 15 Tl
5 15 19 17 Bt
6 20 22 24 Ef
7 13 16 17 Ng
8 26 27 29 Yd
9 22 23 26 Ef
10 24 26 28 Dd
11 23 25 27 Bt
12 27 29 30 Tl
13 18 21 23 Ef
14 21 22 25 Ab
15 12 14 16 Dd
;
proc sort data=temp ;
by sa descending ad descending rs;
run;
proc sort data=temp nodupkey;
by sa;
run;
proc print noobs;
run;
[/pre]
Ksharp Message was edited by: Ksharp
But I have to leave for the sake of working.
Yes.Paige has the right idea here.
try it.'nodupkey' can keep the first duplicate observation.
[pre]
data temp(where=( sp not in('Yd' 'TI' 'Ng')));
input Obs SA AD RS SP $;
datalines;
1 15 20 19 Bt
2 18 21 22 Ab
3 16 18 16 Ab
4 10 12 15 Tl
5 15 19 17 Bt
6 20 22 24 Ef
7 13 16 17 Ng
8 26 27 29 Yd
9 22 23 26 Ef
10 24 26 28 Dd
11 23 25 27 Bt
12 27 29 30 Tl
13 18 21 23 Ef
14 21 22 25 Ab
15 12 14 16 Dd
;
proc sort data=temp ;
by sa descending ad descending rs;
run;
proc sort data=temp nodupkey;
by sa;
run;
proc print noobs;
run;
[/pre]
Ksharp Message was edited by: Ksharp