BookmarkSubscribeRSS Feed
rrmenon
Fluorite | Level 6

In the following query, I am trying to take a 10% sample of the data to create a contact list.

 

Next month, I'd like to pull another sample (run the same query) but not include previously identified contacts (new records sampled in the query)

 

Any idea on how to do this?

 

CREATE TABLE TejonOwned AS

SELECT *, COUNT(t1.PlayerId)

FROM TDQT.vCXPlayer_SAS t1

WHERE t1.SubFightZone = "Tejon Owned" AND RANUNI(4321) between .045 and .055 ;

2 REPLIES 2
ballardw
Super User

Are you going to be running the selection on the same data set? The output of this process? Or a completely new data set that you need to remove the previously selected records from? The current data set with records appended? It isn't quite clear what we are supposed to select from in the future.

 

I am not sure what your Count(t1.playerid) is intended to accomplish since we have none of your data.

 

Personally I would use Proc Surveyselect to pull records as it allows setting a SAMPRATE=10 (for example for 10 percent) and I don't have to fool with picking an (incorrect) interval for ranuni or Rand('uniform') (.45 to .55 is 11% when you consider the end points)

You would use the result of the first selection to mark records in which ever set to select the second one before selection. Typically a join or merge matching on the ID variables and setting flag. Then select where the value is not the flag.

s_lassen
Meteorite | Level 14

The method you show here may work with your input data, but don't count on it. PROC SQL does not guarantee that records are read in the same order, meaning that it is not certain that you will get the same output every time, even if your input data stays the same (SQL may work differently, and fetch data in a different order, if your system is changed in some way).

 

If your data is the same every month (and in the same order), a data step may guarantee that you get the same extract with the same seed.

 

In that case, you can do something like this the first month:

data TejonOwned;
  set TDQT.vCXPlayer_SAS t1;
  WHERE t1.SubFightZone = "Tejon Owned"; 
  if  RANUNI(4321) <0.1 ;
run;

And in the second month, just move on to the next decile, with the same seed:

data TejonOwned;
  set TDQT.vCXPlayer_SAS t1;
  WHERE t1.SubFightZone = "Tejon Owned"; 
  if  0.1<=RANUNI(4321) <0.2 ;
run;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 389 views
  • 2 likes
  • 3 in conversation