I have a dataset with client information and I want to draw a sample for each client and load that sample to a separate dataset. The sample size varies by client. Below is the code I'm using to do this. (I'm still new to SAS and I got help on the macro portion of this code. It was derived from: http://www2.sas.com/proceedings/sugi26/p093-26.pdf ) My question relates to the WHERE filter in the query that INSERTS INTO the sample dataset. The WHERE clause is: 'WHERE zClientName = "&VAR1" AND ranuni(101) <= &VAR2/&VAR3;'. This method of capturing a sample usually does not produce a sample size that exactly equals the desired sample size. (VAR2 in this case) It gives a sample size that's close, but not exactly equal to VAR2. I have a few questions related to this: Can someone provide guidance on the best approach to create a sample that equals the target sample size (in this case VAR2)? (As a side note, I'm not a statistician so I really don't know the impact of generating a sample of 189 observations when 200 was requested. Maybe it's not a significant issue??) I noticed that if I rerun the code below (which should create a new dataset) - my sample size is not changing? I would have thought that if I applied a truly random filter to the total population, that the sample size would change from one run to the next. Does anyone know what would cause this? For question #1 above, I attempted the following SQL Statement in the macro loop below after adding a SAS_Rand field to wpss.ttWPSS: proc sql INOBS = &VAR2; INSERT INTO wpss.ttWPSS SELECT *, ranuni(101) AS SAS_Rand from UCMdb.ttWPSS WHERE zClientName = "&VAR1" ORDER BY SAS_Rand; quit; The problem with this is SAS throws a syntax error (22-322) with the ORDER BY clause. I'm thinking possibly ANSI standard doesn't allow an ordered set to be inserted into a table? - but I'm not sure. I appreciate any insight on this. Thanks, Stuart /*Get unique list of companies and observations (observation = claim)*/ proc sql; create table work.WPSS_ClientList AS SELECT zClientName, ROUND(CASE WHEN Count(*) * .02 < 200 THEN 200 ELSE Count(*) * .02 END, 1) AS SampleSize, Count(*) AS TotalPopulation FROM UCMdb.ttWPSS GROUP BY zClientName; quit; /*Delete the table with samples to ensure table is cleared prior to running*/ proc datasets library=wpss; DELETE ttwpss; quit; run; /*Create the table as a new table*/ proc sql; CREATE TABLE wpss.ttWPSS LIKE UCMdb.ttWPSS; quit; /* Macro to loop through all client observations */ %MACRO ObsIterate(DriverDataset,FIELD1,FIELD2,FIELD3); /* First obtain the number of records in Driver Dataset */ DATA _NULL_; IF 0 THEN SET &DriverDataset NOBS=X; CALL SYMPUT("RECCOUNT",X); STOP; RUN; /* loop from one to number of records */ %DO I=1 %TO &RECCOUNT; /* Advance to the Ith record */ DATA _NULL_; SET &DriverDataset (FIRSTOBS=&I); /* store the variables of interest in macro variables */ CALL SYMPUT("VAR1",&FIELD1); CALL SYMPUT("VAR2",&FIELD2); CALL SYMPUT("VAR3",&FIELD3); STOP; RUN; /* perform tasks on each observation */ proc sql; INSERT INTO wpss.ttWPSS SELECT * from UCMdb.ttWPSS WHERE zClientName = "&VAR1" AND ranuni(101) <= &VAR2/&VAR3; quit; %END; %MEND ObsIterate; /* Call ObsIterate */ %ObsIterate(WPSS_ClientList,zClientName,SampleSize,TotalPopulation); RUN;
... View more