BookmarkSubscribeRSS Feed
aguilar_john
Obsidian | Level 7

Hello everyone,

 

I need to filter out specific IDs which lead to exceeding predefined thresholds. What I have so far:

data have;
	infile datalines delimiter= ',';
	input ID  Factor $ Amount ;
	datalines;
1,a,100
2,b,200
3,c,300
4,a,400
5,a,500
6,c,600
7,c,700
8,b,800
9,a,900
10,b,1000
;

data max_table;
	infile datalines delimiter= ',';
	input Factor $ max_percentage ;
	datalines;
a,0.3
b,0.4
c,0.25
;

PROC SQL;
	CREATE TABLE have_2 AS
	SELECT DISTINCT base.Factor
			,base.amount_factor / sum(amount) 					AS Factor_distribution		FORMAT = PERCENTN8.2
			,base.amount_factor																FORMAT = COMMA20.
			,sum(amount)										AS amount_total				FORMAT = COMMA20.
			,max.max_percentage																FORMAT = PERCENTN8.2
			,base.amount_factor - (sum(amount)*max.max_percentage)	AS Difference
	FROM (SELECT Factor
				,amount
				,sum(amount)									AS amount_factor
		FROM have
		GROUP BY Factor
		)					AS base
	LEFT JOIN max_table 	AS max
	ON base.factor = max.factor

;QUIT;

The table have_2 shows that the amounts for Factor a and c exceed the predefined maximum distribution.

For Factor a I have a maximum of 30% and it exceeds it by 4.55% or 250, and for Factor c it exceeds the maximum of 25% by 4.09% or 225. 

I would need to build a Macro which runs once to figure out whether or not the thresholds are met, if yes the IDs with the highest amounts must be removed from the dataset and written to another dataset.

My final table should then only include the IDs which meet the threshholds.

In this example for Factor a, the ID=9 should be written into another table and for Factor c ID=7 should be removed from the dataset. 

At the end I would need two tables, the first one including all IDs which meet the threshold and the other table all IDs which have been removed (if any). 

Let me know if anything is unclear, and thank you very much for your help.

 

Best

John

8 REPLIES 8
ChrisNZ
Tourmaline | Level 20

Do you always just remove one record: the highest?

 

Note:

This post exemplifies exactly why I never use tabs to format code: you did a perfect job formatting it on your computer, and it looks like a mess here.

aguilar_john
Obsidian | Level 7
I would need to sort the records and then remove the highest ones until the threshold is met. In this example it is just one case but it could be more than one case.

Yes, you are right I noticed the formatting too late,
ChrisNZ
Tourmaline | Level 20

> I would need to sort the records and then remove the highest ones until the threshold is met.

This will never happen.

The sum of percentages is 100, and the sum of thresholds is less than 100. So each time you recalculate, some percentages will be above threshold. 

aguilar_john
Obsidian | Level 7
Maybe I have to rephrase. I need to remove the records with the highest amounts until the value is below the threshold.
ChrisNZ
Tourmaline | Level 20

Like this?

proc sql;
  create table TOTAL as select sum(AMOUNT) as TOTAL from HAVE;
run;

proc sort data=HAVE out=SORTED;
  by FACTOR AMOUNT;
run;

data WANT OUT;
  if _N_=1 then set TOTAL;
  merge SORTED
        MAX_TABLE;
  by FACTOR;
  if first.FACTOR then SUM=0;
  SUM+AMOUNT; 
  if SUM/TOTAL < MAX_PERCENTAGE then output WANT;
  else output OUT;
run;

NOTE: The data set WORK.WANT has 8 observations and 6 variables.
NOTE: The data set WORK.OUT has 2 observations and 6 variables.

aguilar_john
Obsidian | Level 7
Thank you @ChrisNZ, you proposed steps give me correct result, I would now also need to repeat what was done in my PROC SQL Query so that I have the distributions of each Factor again now being lower or equal to the threshold. I could of course repeat the PROC SQL with the table Want, but I hoped to do this in a Macro to repeat automatically.
ChrisNZ
Tourmaline | Level 20
SQL is not suited to processing observations in order. Use the right tool for a given task. Macros are not needed for what you describe.
aguilar_john
Obsidian | Level 7
Ok, thanks for your help.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 667 views
  • 2 likes
  • 2 in conversation