turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- Base SAS Programming
- /
- Find the dividing line(s) in a continuous variable...

Topic Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

08-31-2017 01:58 AM - edited 08-31-2017 02:24 AM

Please see the following partial data set:

```
Continuous Rank
-2.28675 0
-2.21883 0
-1.96124 0
-1.81953 0
-1.28607 0
-0.65065 0
-0.37568 0
-0.33465 0
-0.21190 0
-0.14975 0
0.05879 1
0.10336 1
0.17450 1
0.46424 1
0.46587 1
0.48881 1
0.87138 1
0.91401 1
1.18948 1
1.26697 1
```

I'm attempting to calculate the diving line(s) of "Continuous" based on the different rank values.

In the above case, I could use:

(Lowest 1 Value - Highest 0 Value) / 2

Let's tentatively call the value:

CD_0_1 (CD standing for "Continuous Division")

Not shown would be:

CD_1_2

CD_2_3

CD_3_4

etc,

Can anyone here suggest a way of doing this using SAS.

Thoughts and help greatly appreciated.

Nicholas Kormanik

Accepted Solutions

Solution

09-11-2017
03:21 AM

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to NicholasKormanik

09-01-2017 04:40 AM

Hello,

Assuming your dataset is sorted by Rank, Continuous :

```
data want;
set have;
by Rank;
if last.Rank;
run;
```

Or, with a proc sql :

```
proc sql noprint;
CREATE TABLE want AS
SELECT DISTINCT max(Continuous) AS Continuous, Rank
FROM have
GROUP BY Rank
ORDER BY Rank;
quit;
```

All Replies

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to NicholasKormanik

08-31-2017 08:02 AM

Hi,

could you please provide some more information:

- what's a 'diving line'?

- and how should the result look like?

________________________

- Cheers -

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Oligolas

08-31-2017 09:52 AM - edited 08-31-2017 09:54 AM

In the above case:

(.05879 + (-.14975)) / 2 = -0.04548

Thus, CD_0_1 = -0.04548

That value might be thought of as the 'dividing line' separating the "Continuous Variable" based on the Rank Score.

This is one value of many more following.

In other words:

CD_0_1 = -0.04548

CD_1_2 =

CD_2_3 =

CD_3_4 =

etc.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to NicholasKormanik

08-31-2017 10:44 AM

quickest way - calculate summary statistics for each group first.

Then do group calcuations.

@NicholasKormanik can you show the full expected output for the provided sample data.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Reeza

09-01-2017 01:07 AM - edited 09-01-2017 01:10 AM

Another possible solution might be to simply use the highest "Continuous" from each "Rank".

This, as opposed to trying to find an average in-between, as described above.

So, what the new idea would require is filtering a dataset, only keeping the highest for any rank.

In the above case:

```
Continuous Rank
-0.14975 0
1.26697 1
```

etc.

The new dataset, after filtering, would hold the desired values (pretty closely, at least).

So, the question then becomes, how to filter, leaving only the highest "Continuous" for each ranking value??

Any help greatly appreciated.

Solution

09-11-2017
03:21 AM

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to NicholasKormanik

09-01-2017 04:40 AM

Hello,

Assuming your dataset is sorted by Rank, Continuous :

```
data want;
set have;
by Rank;
if last.Rank;
run;
```

Or, with a proc sql :

```
proc sql noprint;
CREATE TABLE want AS
SELECT DISTINCT max(Continuous) AS Continuous, Rank
FROM have
GROUP BY Rank
ORDER BY Rank;
quit;
```

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to gamotte

09-01-2017 05:02 AM - edited 09-01-2017 05:07 AM

Your last post seems contradictory with what you first explained where the maximum value was taken for rank 0 and the minimum value for rank 1.

Here is a program that computes the value you indicate in your example :

data want; set have; by Rank; retain minC maxC prevMin prevMax; if first.Rank then do; maxC=Continuous; minC=Continuous; end; else do; maxC=max(Continuous, maxC); minC=min(Continuous, minC); end; if last.Rank then do; if Rank>0 then do; CD=(max(minC,prevMin)+min(maxC,prevMax))/2; output; end; prevMax=maxC; prevMin=minC; end; run;

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to gamotte

09-01-2017 05:23 AM - edited 09-01-2017 05:23 AM

If the dataset is sorted by Rank, Continuous, the program can be simplified :

data want; set have; by Rank; keep Continuous Rank CD; retain minC maxC prevMin prevMax; if first.Rank then do; minC=Continuous; end; if last.Rank then do; if Rank>0 then do; CD=(max(minC,prevMin)+min(Continuous,prevMax))/2; output; end; prevMax=Continuous; prevMin=minC; end; run;

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to NicholasKormanik

09-01-2017 11:36 AM

I'd use PROC MEANS to compute MIN and MAX, but there are many other ways:

```
proc means data=a min max;
class rank;
var continuous;
output out=out(where=(RANK^=.)) min=min max=max;
run;
data lines;
set out;
line = (min + lag(max)) / 2;
run;
```