BookmarkSubscribeRSS Feed
stuart753
Calcite | Level 5

I have a dataset with client information and I want to draw a sample for each client and load that sample to a separate dataset.  The sample size varies by client.  Below is the code I'm using to do this. (I'm still new to SAS and I got help on the macro portion of this code.  It was derived from: http://www2.sas.com/proceedings/sugi26/p093-26.pdf )

My question relates to the WHERE filter in the query that INSERTS INTO the sample dataset.  The WHERE clause is: 'WHERE zClientName = "&VAR1" AND ranuni(101) <= &VAR2/&VAR3;'.  This method of capturing a sample usually does not produce a sample size that exactly equals the desired sample size.  (VAR2 in this case)  It gives a sample size that's close, but not exactly equal to VAR2.  I have a few questions related to this:

  1. Can someone provide guidance on the best approach to create a sample that equals the target sample size (in this case VAR2)?  (As a side note, I'm not a statistician so I really don't know the impact of generating a sample of 189 observations when 200 was requested.  Maybe it's not a significant issue??)
  2. I noticed that if I rerun the code below (which should create a new dataset) - my sample size is not changing?  I would have thought that if I applied a truly random filter to the total population, that the sample size would change from one run to the next.  Does anyone know what would cause this?


For question #1 above, I attempted the following SQL Statement in the macro loop below after adding a SAS_Rand field to wpss.ttWPSS:

proc sql INOBS = &VAR2;

INSERT INTO wpss.ttWPSS

   SELECT *, ranuni(101) AS SAS_Rand from UCMdb.ttWPSS

      WHERE zClientName = "&VAR1"

   ORDER BY SAS_Rand;

quit;

The problem with this is SAS throws a syntax error (22-322) with the ORDER BY clause.  I'm thinking possibly ANSI standard doesn't allow an ordered set to be inserted into a table?  - but I'm not sure.

I appreciate any insight on this.

Thanks,

Stuart

/*Get unique list of companies and observations (observation = claim)*/

proc sql;
create table work.WPSS_ClientList AS
SELECT zClientName, ROUND(CASE WHEN Count(*) * .02 < 200 THEN 200 ELSE Count(*) * .02 END, 1) AS SampleSize,
   Count(*) AS TotalPopulation
  FROM UCMdb.ttWPSS
GROUP BY zClientName;
quit;

/*Delete the table with samples to ensure table is cleared prior to running*/
proc datasets library=wpss;
   DELETE ttwpss;
quit;
run;

/*Create the table as a new table*/
proc sql;
   CREATE TABLE wpss.ttWPSS
      LIKE UCMdb.ttWPSS;
quit;

/* Macro to loop through all client observations */
%MACRO ObsIterate(DriverDataset,FIELD1,FIELD2,FIELD3);

/* First obtain the number of records in Driver Dataset */

DATA _NULL_;
IF 0 THEN SET &DriverDataset NOBS=X;
CALL SYMPUT("RECCOUNT",X);
STOP;
RUN;
/* loop from one to number of records */
%DO I=1 %TO &RECCOUNT;
/* Advance to the Ith record */
DATA _NULL_;
SET &DriverDataset (FIRSTOBS=&I);
/* store the variables of interest in macro variables */
CALL SYMPUT("VAR1",&FIELD1);
CALL SYMPUT("VAR2",&FIELD2);
CALL SYMPUT("VAR3",&FIELD3);
STOP;
RUN;
/* perform tasks on each observation */

proc sql;
INSERT INTO wpss.ttWPSS
   SELECT * from UCMdb.ttWPSS
      WHERE zClientName = "&VAR1" AND ranuni(101) <= &VAR2/&VAR3;
quit;

%END;

%MEND ObsIterate;

/* Call ObsIterate */
%ObsIterate(WPSS_ClientList,zClientName,SampleSize,TotalPopulation);
RUN;

3 REPLIES 3
ballardw
Super User

Can someone provide guidance on the best approach to create a sample that equals the target sample size (in this case VAR2)?  (As a side note, I'm not a statistician so I really don't know the impact of generating a sample of 189 observations when 200 was requested.  Maybe it's not a significant issue??)

A smaller sample will usually result in wider than designed confidence limits and/or less power in a test. How much and the practical impact will depend on way more information.

I noticed that if I rerun the code below (which should create a new dataset) - my sample size is not changing?  I would have thought that if I applied a truly random filter to the total population

When you use ranuni with the same seed you tend to get the same sequence of random numbers.

If I needed a fixed size random sample I would look into proc Surveyselect.your list of VAR2 values could well be a list of sampsize parameters for a strata of clientnames.

stuart753
Calcite | Level 5

Thanks ballardw for your reply.  I realize my question was somewhat broad and I appreciate your guidance. (actually, I really asked two questions)  I'll look into proc surveyselect.

Stuart

AlexCurrie
Calcite | Level 5

You can use proc sql to do the sampling by making the following changes:

- use CREATE TABLE [temporary work dataset] AS and do the INSERT INTO as a separate statement

- from INOBS to OUTOBS to limit the number of observations written out to the temporary work dataset to exact number you want for your sample

- use ranuni(0) to get a different pseudo-random seed each run (based on clock time)

proc sql OUTOBS = &VAR2;

CREATE TABLE temp AS

SELECT *, ranuni(0) AS SAS_Rand from UCMdb.ttWPSS

WHERE zClientName = "&VAR1"

ORDER BY SAS_Rand;

INSERT INTO wpss.ttWPSS

SELECT * FROM temp;

quit;

This link may be helpful as well: http://www2.sas.com/proceedings/sugi31/168-31.pdf

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 9349 views
  • 3 likes
  • 3 in conversation