BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
PeterBr
Obsidian | Level 7

Hi All, I have the following data. I'd like to create 3 groups (1, 3, 5) based on the percent variable. I'd like the group = 1 to represent the closest approximation to 20%. For example, .18 and .23 exist but .18 is closer to .2 so I'd like that to be the cutoff. Essentially, the tie goes to the number that minimizes the difference from .2. The same logic applies for group 5 but instead I'd like the cutoff to be .8, again I'd like the tie to go towards the number that minimizes the difference from 0.8. For all the values in between, they can be assigned group = 3. The want data is below. Thanks!

 


data have;
   input percent;
   datalines;
.01
.05
.16
.18
.23
.46
.55 
.67
.69
.698
.74
.78
.81
.91
.82
;
run;

data want;
   input percent group;
   datalines;
.01 1
.05 1
.16 1
.18 1
.23 3
.46 3
.55 3
.67 3
.69 3
.698 3
.74 3
.78 3
.81 5
.91 5
.92 5
;
run;
1 ACCEPTED SOLUTION

Accepted Solutions
RichardDeVen
Barite | Level 11

Here is some SQL that finds the 'breakpoints' and uses them to apply the group values.

 

The 1 and 5 groups are computed to contain the largest number of percents in case of ties, which means ties for group 1 break at value > 0.20 and ties for group 5 break at value < 0.80.  You will want to ROUND results to ensure ties are properly identified.

 

Example:

 

data have;
input percent;
datalines;
.01
.05
.16
.18
.22 /* higher tie */
.23
.46
.55
.67
.69
.698
.74
.78 /* lower tie */
.82
.91
.92
; proc sql; create table breakpoints as select percent, group, diff from ( ( select percent , 1 as group , round(abs(percent-0.20),0.01) as diff from have having diff = min(diff) ) union (select percent , 5 as group , round(abs(percent-0.80),0.01) as diff from have having diff = min(diff) ) ) group by group having group = 1 and percent=max(percent) /* tie processing */ or group = 5 and percent=min(percent) ; create table want as select have.* , coalesce(group, 3) as group from have left join breakpoints on have.percent <= breakpoints.percent and group = 1 or have.percent >= breakpoints.percent and group = 5 order by percent ;

View solution in original post

6 REPLIES 6
PeterBr
Obsidian | Level 7

Sorry the .82 in the have data should have been .92, here it is again:

 

data have;
   input percent;
   datalines;
.01
.05
.16
.18
.23
.46
.55 
.67
.69
.698
.74
.78
.81
.91
.92
;
run;

 

data want;
   input percent group;
   datalines;
.01 1
.05 1
.16 1
.18 1
.23 3
.46 3
.55 3
.67 3
.69 3
.698 3
.74 3
.78 3
.81 5
.91 5
.92 5
;
run;

 

andreas_lds
Jade | Level 19

From your description i expected 0.81 to be in group 3, because the distance to 0.8 is smaller than the distance between 0.78 and 0.8

mkeintz
PROC Star

What if you have a tie, e.g.  one value of .18 and one value of .22?  They are the same distance from .2, so which one becomes your cutoff for group 1?

 

And what if your lowest value is, say, .35?   Does it get put in group 1?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
RichardDeVen
Barite | Level 11

Here is some SQL that finds the 'breakpoints' and uses them to apply the group values.

 

The 1 and 5 groups are computed to contain the largest number of percents in case of ties, which means ties for group 1 break at value > 0.20 and ties for group 5 break at value < 0.80.  You will want to ROUND results to ensure ties are properly identified.

 

Example:

 

data have;
input percent;
datalines;
.01
.05
.16
.18
.22 /* higher tie */
.23
.46
.55
.67
.69
.698
.74
.78 /* lower tie */
.82
.91
.92
; proc sql; create table breakpoints as select percent, group, diff from ( ( select percent , 1 as group , round(abs(percent-0.20),0.01) as diff from have having diff = min(diff) ) union (select percent , 5 as group , round(abs(percent-0.80),0.01) as diff from have having diff = min(diff) ) ) group by group having group = 1 and percent=max(percent) /* tie processing */ or group = 5 and percent=min(percent) ; create table want as select have.* , coalesce(group, 3) as group from have left join breakpoints on have.percent <= breakpoints.percent and group = 1 or have.percent >= breakpoints.percent and group = 5 order by percent ;
PeterBr
Obsidian | Level 7

Hi this code is almost perfect, I really appreciate it! One thing I noticed though is when there are duplicate percent values, the step when creating the want table, it deletes the duplicates so that only one unique value of percent can exist. With the have data below, your code produces the want data and see that .23 and .18 duplicates are both missing. Is there a way to resolve that? It says there there might be a cartesian product problem?

 


data have;
   input percent;
   datalines;
.01
.05
.16
.18
.18
.23
.23
.46
.55
.67
.69
.698
.74
.78
.82
.91
.92
;
run;


data want;
   input percent group;
   datalines;
.01 1
.05 1
.16 1
.18 1
.23 3
.46 3
.55 3
.67 3
.69 3
.698 3
.74 3
.78 3
.82 5
.91 5
.92 5
;
run;
PeterBr
Obsidian | Level 7

ignore my last comment it was a problem on my end, works great thanks so much!

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 708 views
  • 0 likes
  • 4 in conversation