SAS Procedures

Help using Base SAS procedures
BookmarkSubscribeRSS Feed
stuart753
Calcite | Level 5

I have a dataset with client information and I want to draw a sample for each client and load that sample to a separate dataset.  The sample size varies by client.  Below is the code I'm using to do this. (I'm still new to SAS and I got help on the macro portion of this code.  It was derived from: http://www2.sas.com/proceedings/sugi26/p093-26.pdf )

My question relates to the WHERE filter in the query that INSERTS INTO the sample dataset.  The WHERE clause is: 'WHERE zClientName = "&VAR1" AND ranuni(101) <= &VAR2/&VAR3;'.  This method of capturing a sample usually does not produce a sample size that exactly equals the desired sample size.  (VAR2 in this case)  It gives a sample size that's close, but not exactly equal to VAR2.  I have a few questions related to this:

  1. Can someone provide guidance on the best approach to create a sample that equals the target sample size (in this case VAR2)?  (As a side note, I'm not a statistician so I really don't know the impact of generating a sample of 189 observations when 200 was requested.  Maybe it's not a significant issue??)
  2. I noticed that if I rerun the code below (which should create a new dataset) - my sample size is not changing?  I would have thought that if I applied a truly random filter to the total population, that the sample size would change from one run to the next.  Does anyone know what would cause this?


For question #1 above, I attempted the following SQL Statement in the macro loop below after adding a SAS_Rand field to wpss.ttWPSS:

proc sql INOBS = &VAR2;

INSERT INTO wpss.ttWPSS

   SELECT *, ranuni(101) AS SAS_Rand from UCMdb.ttWPSS

      WHERE zClientName = "&VAR1"

   ORDER BY SAS_Rand;

quit;

The problem with this is SAS throws a syntax error (22-322) with the ORDER BY clause.  I'm thinking possibly ANSI standard doesn't allow an ordered set to be inserted into a table?  - but I'm not sure.

I appreciate any insight on this.

Thanks,

Stuart

/*Get unique list of companies and observations (observation = claim)*/

proc sql;
create table work.WPSS_ClientList AS
SELECT zClientName, ROUND(CASE WHEN Count(*) * .02 < 200 THEN 200 ELSE Count(*) * .02 END, 1) AS SampleSize,
   Count(*) AS TotalPopulation
  FROM UCMdb.ttWPSS
GROUP BY zClientName;
quit;

/*Delete the table with samples to ensure table is cleared prior to running*/
proc datasets library=wpss;
   DELETE ttwpss;
quit;
run;

/*Create the table as a new table*/
proc sql;
   CREATE TABLE wpss.ttWPSS
      LIKE UCMdb.ttWPSS;
quit;

/* Macro to loop through all client observations */
%MACRO ObsIterate(DriverDataset,FIELD1,FIELD2,FIELD3);

/* First obtain the number of records in Driver Dataset */

DATA _NULL_;
IF 0 THEN SET &DriverDataset NOBS=X;
CALL SYMPUT("RECCOUNT",X);
STOP;
RUN;
/* loop from one to number of records */
%DO I=1 %TO &RECCOUNT;
/* Advance to the Ith record */
DATA _NULL_;
SET &DriverDataset (FIRSTOBS=&I);
/* store the variables of interest in macro variables */
CALL SYMPUT("VAR1",&FIELD1);
CALL SYMPUT("VAR2",&FIELD2);
CALL SYMPUT("VAR3",&FIELD3);
STOP;
RUN;
/* perform tasks on each observation */

proc sql;
INSERT INTO wpss.ttWPSS
   SELECT * from UCMdb.ttWPSS
      WHERE zClientName = "&VAR1" AND ranuni(101) <= &VAR2/&VAR3;
quit;

%END;

%MEND ObsIterate;

/* Call ObsIterate */
%ObsIterate(WPSS_ClientList,zClientName,SampleSize,TotalPopulation);
RUN;

3 REPLIES 3
ballardw
Super User

Can someone provide guidance on the best approach to create a sample that equals the target sample size (in this case VAR2)?  (As a side note, I'm not a statistician so I really don't know the impact of generating a sample of 189 observations when 200 was requested.  Maybe it's not a significant issue??)

A smaller sample will usually result in wider than designed confidence limits and/or less power in a test. How much and the practical impact will depend on way more information.

I noticed that if I rerun the code below (which should create a new dataset) - my sample size is not changing?  I would have thought that if I applied a truly random filter to the total population

When you use ranuni with the same seed you tend to get the same sequence of random numbers.

If I needed a fixed size random sample I would look into proc Surveyselect.your list of VAR2 values could well be a list of sampsize parameters for a strata of clientnames.

stuart753
Calcite | Level 5

Thanks ballardw for your reply.  I realize my question was somewhat broad and I appreciate your guidance. (actually, I really asked two questions)  I'll look into proc surveyselect.

Stuart

AlexCurrie
Calcite | Level 5

You can use proc sql to do the sampling by making the following changes:

- use CREATE TABLE [temporary work dataset] AS and do the INSERT INTO as a separate statement

- from INOBS to OUTOBS to limit the number of observations written out to the temporary work dataset to exact number you want for your sample

- use ranuni(0) to get a different pseudo-random seed each run (based on clock time)

proc sql OUTOBS = &VAR2;

CREATE TABLE temp AS

SELECT *, ranuni(0) AS SAS_Rand from UCMdb.ttWPSS

WHERE zClientName = "&VAR1"

ORDER BY SAS_Rand;

INSERT INTO wpss.ttWPSS

SELECT * FROM temp;

quit;

This link may be helpful as well: http://www2.sas.com/proceedings/sugi31/168-31.pdf

sas-innovate-white.png

Special offer for SAS Communities members

Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.

 

View the full agenda.

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 10459 views
  • 3 likes
  • 3 in conversation