- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I have a dataset with client information and I want to draw a sample for each client and load that sample to a separate dataset. The sample size varies by client. Below is the code I'm using to do this. (I'm still new to SAS and I got help on the macro portion of this code. It was derived from: http://www2.sas.com/proceedings/sugi26/p093-26.pdf )
My question relates to the WHERE filter in the query that INSERTS INTO the sample dataset. The WHERE clause is: 'WHERE zClientName = "&VAR1" AND ranuni(101) <= &VAR2/&VAR3;'. This method of capturing a sample usually does not produce a sample size that exactly equals the desired sample size. (VAR2 in this case) It gives a sample size that's close, but not exactly equal to VAR2. I have a few questions related to this:
- Can someone provide guidance on the best approach to create a sample that equals the target sample size (in this case VAR2)? (As a side note, I'm not a statistician so I really don't know the impact of generating a sample of 189 observations when 200 was requested. Maybe it's not a significant issue??)
- I noticed that if I rerun the code below (which should create a new dataset) - my sample size is not changing? I would have thought that if I applied a truly random filter to the total population, that the sample size would change from one run to the next. Does anyone know what would cause this?
For question #1 above, I attempted the following SQL Statement in the macro loop below after adding a SAS_Rand field to wpss.ttWPSS:
proc sql INOBS = &VAR2;
INSERT INTO wpss.ttWPSS
SELECT *, ranuni(101) AS SAS_Rand from UCMdb.ttWPSS
WHERE zClientName = "&VAR1"
ORDER BY SAS_Rand;
quit;
The problem with this is SAS throws a syntax error (22-322) with the ORDER BY clause. I'm thinking possibly ANSI standard doesn't allow an ordered set to be inserted into a table? - but I'm not sure.
I appreciate any insight on this.
Thanks,
Stuart
/*Get unique list of companies and observations (observation = claim)*/
proc sql;
create table work.WPSS_ClientList AS
SELECT zClientName, ROUND(CASE WHEN Count(*) * .02 < 200 THEN 200 ELSE Count(*) * .02 END, 1) AS SampleSize,
Count(*) AS TotalPopulation
FROM UCMdb.ttWPSS
GROUP BY zClientName;
quit;
/*Delete the table with samples to ensure table is cleared prior to running*/
proc datasets library=wpss;
DELETE ttwpss;
quit;
run;
/*Create the table as a new table*/
proc sql;
CREATE TABLE wpss.ttWPSS
LIKE UCMdb.ttWPSS;
quit;
/* Macro to loop through all client observations */
%MACRO ObsIterate(DriverDataset,FIELD1,FIELD2,FIELD3);
/* First obtain the number of records in Driver Dataset */
DATA _NULL_;
IF 0 THEN SET &DriverDataset NOBS=X;
CALL SYMPUT("RECCOUNT",X);
STOP;
RUN;
/* loop from one to number of records */
%DO I=1 %TO &RECCOUNT;
/* Advance to the Ith record */
DATA _NULL_;
SET &DriverDataset (FIRSTOBS=&I);
/* store the variables of interest in macro variables */
CALL SYMPUT("VAR1",&FIELD1);
CALL SYMPUT("VAR2",&FIELD2);
CALL SYMPUT("VAR3",&FIELD3);
STOP;
RUN;
/* perform tasks on each observation */
proc sql;
INSERT INTO wpss.ttWPSS
SELECT * from UCMdb.ttWPSS
WHERE zClientName = "&VAR1" AND ranuni(101) <= &VAR2/&VAR3;
quit;
%END;
%MEND ObsIterate;
/* Call ObsIterate */
%ObsIterate(WPSS_ClientList,zClientName,SampleSize,TotalPopulation);
RUN;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Can someone provide guidance on the best approach to create a sample that equals the target sample size (in this case VAR2)? (As a side note, I'm not a statistician so I really don't know the impact of generating a sample of 189 observations when 200 was requested. Maybe it's not a significant issue??)
A smaller sample will usually result in wider than designed confidence limits and/or less power in a test. How much and the practical impact will depend on way more information.
I noticed that if I rerun the code below (which should create a new dataset) - my sample size is not changing? I would have thought that if I applied a truly random filter to the total population
When you use ranuni with the same seed you tend to get the same sequence of random numbers.
If I needed a fixed size random sample I would look into proc Surveyselect.your list of VAR2 values could well be a list of sampsize parameters for a strata of clientnames.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks ballardw for your reply. I realize my question was somewhat broad and I appreciate your guidance. (actually, I really asked two questions) I'll look into proc surveyselect.
Stuart
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You can use proc sql to do the sampling by making the following changes:
- use CREATE TABLE [temporary work dataset] AS and do the INSERT INTO as a separate statement
- from INOBS to OUTOBS to limit the number of observations written out to the temporary work dataset to exact number you want for your sample
- use ranuni(0) to get a different pseudo-random seed each run (based on clock time)
proc sql OUTOBS = &VAR2;
CREATE TABLE temp AS
SELECT *, ranuni(0) AS SAS_Rand from UCMdb.ttWPSS
WHERE zClientName = "&VAR1"
ORDER BY SAS_Rand;
INSERT INTO wpss.ttWPSS
SELECT * FROM temp;
quit;
This link may be helpful as well: http://www2.sas.com/proceedings/sugi31/168-31.pdf