Random sampling with conditions

Before, look at the table below:

21002500000Canada Japan

I have two sorts of conditions I want to apply for two different scenarios. Let's begin with the first.

  1. First, I want to sample differently according to two conditions. I already know this
    code but it’s quite simple for my purpose.

procsurveyselect data=have out=want method=SRS samprate=0.75 seed=8757587 rep=1;

strata PackageID;


… but I want that the code checks the following conditions :

Choose 75% of all PackageIDs when LenderCountry=BorrowerCountry and choose 25% of all PackageIDs when LenderCountry NE BorrowerCountry.

a) For the first condition, keep 75% of all PackageIDs when LenderCountry=BorrowerCountry=Canada will be chosen randomly (i.e. keep lenders in this packageID which are Canadian i.e. LenderCountry=Canada), 75% of alls PackageIDs when LenderCountry=BorrowerCountry=USA will be chosen randomly (i.e. keep lenders in this packageID which are American i.e. LenderCountry=USA), 75% of alls PackageIDs when LenderCountry=BorrowerCountry=Spain will be chosen randomly (i.e. keep lenders
in this packageID which are Spanish i.e. LenderCountry=Spain), etc.

b.1) For the second condition, this condition is illustrated with an example. This condition is created for any lender which comes from a country (p. ex. LenderCountry=Canada) that lends to a borrower which comes from
another country (not Canada, p. ex. BorrowerCountry=USA). In the existing pool of all PackageIDs which obey to LenderCountry=Canada and BorrowerCountry=USA, I want that the code selects 25% of all PackageIDs in this subsample (i.e. keep lenders in these packageIDs
that are Canadian i.e. LenderCountry=Canada). For the big picture, this must be repeated for all subsamples of packageIDs (p.ex.LenderCountry=Canada AND BorrowerCountry=Spain, LenderCountry=Canada AND BorrowerCountry=Italy, …) and not just for LenderCountry=Canada, but for all lenders where LenderCountry NE BorrowerCountry (p. ex. LenderCountry=USA and BorrowerCountry=Spain, LenderCountry=Spain and BorrowerCountry=Italy, …).

b.2)I have another way to code the second condition, if it is easier: choose randomly 25% of all packageIDs whereLenderCountry NE BorrowerCountry. In this way, if LenderCountry=Canada, 25% of all packageIDs where there is a lender which comes from Canada and which the borrower comes from another country (not Canada) will be chosen. The code won't choose proportionally the number of different packageIDs for different borrowerCountry  but it can also work.

The two conditions can cross each other (i.e. a and b.1 OR a and b.2. depends upon the way the second condition will be coded). For example, if the packageID 1i s chosen among 75% of all PackagesIDs when LenderCountry=BorrowerCountry=Canada will, we keep the lender 100 because it respects the first condition. But if the packageID 1 is (also) chosen randomly among 25% of all packageIDs where LenderCountry=Spain and BorrowerCountry=Canada (for b.1 or among 25% of all packageIDs where LenderCountry=Spain and BorrowerCountry NE Spain for b.2) then we keep the lender 103 because it respects the second condition.

2- Second scenario:

I want to sample differently according to two conditions again but the problem is featured differently. I have again two conditions: choose 75% of all Canadian participations when LenderCountry=BorrowerCountry and choose 25% of all Canadian participations when LenderCountry=canada NE BorrowerCountry.

In this case, I don’t restrict my random sampling by packageIDs. I ‘’count’’ the number of Canadian participations (i.e. the number of times I see LenderCountry=Canada, not matter how many packageIDs. I can have five Canadian lenders in a packageID  like I can also have two Canadian lenders) and BorrowerCountry=Canada and apply a random sampling of 75% for the first condition. The same for the second condition: b.1) or b.2) according to the way the coding will be done. We check the participation without checking the number of packageIDs.

Thank you so much in advance!

Super User
Re: Random sampling with conditions

For the first one it looks like you need to add a variable to indicate another stratum indicating whether the record is 75 or 25 percent rate and then assign the samprates. It may be easier in a secondary data.

Your example data if complete for ID 1 may cause problems with your expected rate as there is only one in the 75% category, if I understand your data and requirements.

The second part looks like a similar but the input dataset to surveyselect could use options to select the records for canada.

