All, I am looking to pull random records based on the sum of the balances totaling 50mm.
So each record has a balance. I need to pull random records where the sum of the random records doesn't exceed 50mm.
I am not clear how to accomplish in SAS.
Please let me know
Example data.
Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the </> icon or attached as text to show exactly what you have and that we can test code against.
Some example of the desired output for that example data.
Such things as how many records define a group may be needed.Is there a minimum number or maximum number.
Also, do you have negative values for the variable? If so are they allowed to be selected?
How about missing values? Many missing values could be included.
Are any other variables such as identification or dates to be considered.
Thank you for responding.
I have run suppressions on the entire population based on the business rules for exclusions.
Fields in table are:
ref_no (distinct value to identify record)
balance
exclude_reason (null = inclusion ; not null = exclude)
There is no requirement to pull a certain amount of records. Only requirement is that the sum of balances can't exceed 50mm.
Let me know if that helps
How many records?
I might start just to see:
Proc surveyselect data=have (where=(balance=50)) out=selected sampsize=1 ; run;
to see if you happen to have one or more records with a balance of 50. This matches your stated requirement to select records that have a balance total to 50. If this returns a record I quit with your requirements.
I have a strong suspicion that there is/ are other requirements/restrictions not stated.
I still say provide some actual example data and one or more possible results that you can create looking at the data manually.
This is possibly one way to get sets of two records that sum to 50 and then select one (or more) from those at random.
proc sql; create table as twos as select a.id,a.balance, b.id as id2, b.balance as balance2 from have as a, have as b where a.balance + b.balance=50 ; run; proc surveyselect data=twos out=selecttwos sampsize=1; run;
Extensible to 3, 4, 5 ... values, though it starts getting cumbersome.
Generally best practice before saying "select at random" is to clearly define the selection space and you have not done that very well because your values might take 500 values to get your target value since you have not provided any reasonable example data to work with.
In one sentence, you want the random records to total 50mm. In the next sentence, you want the sum of these random records doesn't exceed 50mm.
These are not the same. Which is it?
I would like the random records to total 50mm
@JoeJ wrote:
I would like the random records to total 50mm
It may not be possible to have randomly selected records exactly equal that value. Then what?
Here's a quick example:
/* Create test data */
data have ;
do i=1 to 10 ;
amount=int(rand('uniform',1,20)) ;
output ;
end ;
run ;
data want ;
/* retain total and max */
retain
total 0
max 30 ; /* max is the number you don't want total to exceed */
set have ;
/* randomly delete obs */
if rand('uniform')>0.5 then
delete ;
/* Check if current amount will take you over the max and delete if needed */
if total+amount>max then
delete ;
/* increase total */
total=total+amount ;
/* output any ob that gets this far */
output ;
run ;
Then I see the update you want it to equal 50
So you'll need to change the above code
This will get you closer, but if you want the sample to be equal to 50mm, then that's going to be more difficult and potentially impossible depending on your data. For example say you have just 3 observations 25m, 15mm, 20mm - Just not possible to get 50mm
/* Create test data */
data have ;
do i=1 to 10 ;
amount=int(rand('uniform',1,20)) ;
output ;
end ;
run ;
proc sort data= have out=srtd ;
by descending amount ;
run ;
data want ;
/* retain total and max */
retain
total 0
max 30 ; /* max is the number you don't want total to exceed */
set srtd ;
/* randomly delete obs */
if rand('uniform')>0.5 then
delete ;
/* Check if current amount will take you over the max and delete if needed */
if total+amount>max then
delete ;
/* increase total */
total=total+amount ;
/* output any ob that gets this far */
output ;
run ;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.