BookmarkSubscribeRSS Feed
JoeJ
Calcite | Level 5

All, I am looking to pull random records based on the sum of the balances totaling 50mm. 

 

So each record has a balance. I need to pull random records where the sum of the random records doesn't exceed 50mm. 

 

I am not clear how to accomplish in SAS. 

 

Please let me know

8 REPLIES 8
ballardw
Super User

Example data.

Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the </> icon or attached as text to show exactly what you have and that we can test code against.

 

Some example of the desired output for that example data.

 

Such things as how many records define a group may be needed.Is there a minimum number or maximum number.

Also, do you have negative values for the variable? If so are they allowed to be selected?

How about missing values? Many missing values could be included.

 

Are any other variables such as identification or dates to be considered.

JoeJ
Calcite | Level 5

Thank you for responding. 

 

I have run suppressions on the entire population based on the business rules for exclusions. 

 

Fields in table are:

 

ref_no (distinct value to identify record)

balance 

exclude_reason (null = inclusion ; not null = exclude)

 

There is no requirement to pull a certain amount of records. Only requirement is that the sum of balances can't exceed 50mm. 

Let me know if that helps

 

ballardw
Super User

How many records?

 

I might start just to see:

 

Proc surveyselect data=have (where=(balance=50)) out=selected sampsize=1 
;
run;

to see if you happen to have one or more records with a balance of 50. This matches your stated requirement to select records that have a balance total to 50. If this returns a record I quit with your requirements.

 

I have a strong suspicion that there is/ are other requirements/restrictions not stated.

 

I still say provide some actual example data and one or more possible results that you can create looking at the data manually.

 

This is possibly one way to get sets of two records that sum to 50 and then select one (or more) from those at random.

proc sql;
   create table as twos as
   select a.id,a.balance, b.id as id2, b.balance as balance2
   from have as a, have as b
   where a.balance + b.balance=50
   ;
run;

proc surveyselect data=twos out=selecttwos
   sampsize=1;
run;

Extensible to 3, 4, 5 ... values, though it starts getting cumbersome.

 

Generally best practice before saying "select at random" is to clearly define the selection space and you have not done that very well because your values might take 500 values to get your target value since you have not provided any reasonable example data to work with.

PaigeMiller
Diamond | Level 26

In one sentence, you want the random records to total 50mm. In the next sentence, you want the sum of these random records doesn't exceed 50mm.

 

These are not the same. Which is it?

--
Paige Miller
JoeJ
Calcite | Level 5

I would like the random records to total 50mm 

PaigeMiller
Diamond | Level 26

@JoeJ wrote:

I would like the random records to total 50mm 


It may not be possible to have randomly selected records exactly equal that value. Then what?

--
Paige Miller
AMSAS
SAS Super FREQ

Here's a quick example:

/* Create test data */

data have ;
	do i=1 to 10 ;
		amount=int(rand('uniform',1,20)) ;
		output ;
	end ;
run ;

data want ;
	/* retain total and max */
	retain 
		total 0 
		max  30 ; /* max is the number you don't want total to exceed */
	set have ;
	/* randomly delete obs */
	if rand('uniform')>0.5 then 
		delete ;
	/* Check if current amount will take you over the max and delete if needed */
	if total+amount>max then
		delete ;
	/* increase total */
	total=total+amount ;
	/* output any ob that gets this far */
	output ;
run ;

 

 

AMSAS
SAS Super FREQ

Then I see the update you want it to equal 50
So you'll need to change the above code

 

This will get you closer, but if you want the sample to be equal to 50mm, then that's going to be more difficult and potentially impossible depending on your data. For example say you have just 3 observations 25m, 15mm, 20mm - Just not possible to get 50mm

 

/* Create test data */

data have ;
	do i=1 to 10 ;
		amount=int(rand('uniform',1,20)) ;
		output ;
	end ;
run ;

proc sort data= have out=srtd ;
	by descending amount ; 
run ;

data want ;
	/* retain total and max */
	retain 
		total 0 
		max  30 ; /* max is the number you don't want total to exceed */
	set srtd ;
	/* randomly delete obs */
	if rand('uniform')>0.5 then 
		delete ;
	/* Check if current amount will take you over the max and delete if needed */
	if total+amount>max then
		delete ;
	/* increase total */
	total=total+amount ;
	/* output any ob that gets this far */
	output ;
run ;

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 454 views
  • 2 likes
  • 4 in conversation