In the following query, I am trying to take a 10% sample of the data to create a contact list.
Next month, I'd like to pull another sample (run the same query) but not include previously identified contacts (new records sampled in the query)
Any idea on how to do this?
CREATE TABLE TejonOwned AS
SELECT *, COUNT(t1.PlayerId)
FROM TDQT.vCXPlayer_SAS t1
WHERE t1.SubFightZone = "Tejon Owned" AND RANUNI(4321) between .045 and .055 ;
Are you going to be running the selection on the same data set? The output of this process? Or a completely new data set that you need to remove the previously selected records from? The current data set with records appended? It isn't quite clear what we are supposed to select from in the future.
I am not sure what your Count(t1.playerid) is intended to accomplish since we have none of your data.
Personally I would use Proc Surveyselect to pull records as it allows setting a SAMPRATE=10 (for example for 10 percent) and I don't have to fool with picking an (incorrect) interval for ranuni or Rand('uniform') (.45 to .55 is 11% when you consider the end points)
You would use the result of the first selection to mark records in which ever set to select the second one before selection. Typically a join or merge matching on the ID variables and setting flag. Then select where the value is not the flag.
The method you show here may work with your input data, but don't count on it. PROC SQL does not guarantee that records are read in the same order, meaning that it is not certain that you will get the same output every time, even if your input data stays the same (SQL may work differently, and fetch data in a different order, if your system is changed in some way).
If your data is the same every month (and in the same order), a data step may guarantee that you get the same extract with the same seed.
In that case, you can do something like this the first month:
data TejonOwned;
set TDQT.vCXPlayer_SAS t1;
WHERE t1.SubFightZone = "Tejon Owned";
if RANUNI(4321) <0.1 ;
run;
And in the second month, just move on to the next decile, with the same seed:
data TejonOwned;
set TDQT.vCXPlayer_SAS t1;
WHERE t1.SubFightZone = "Tejon Owned";
if 0.1<=RANUNI(4321) <0.2 ;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.