BookmarkSubscribeRSS Feed
MsGeritO
Obsidian | Level 7

Hello!

 

I have a small survey with stratified sampling and differing sampling ratios. 

total population is 1.600, sampling was 801 overall with rations from 0.25 to 1, non-respond was high with overall 200 respondents. (The joys of field work!!)

 

Nonetheless, I need to work with this data. I calculated growth rates of costs per employee per respondent and I would like to identify outliers on 95% confidence interval with proc surveymeans. However, my results don't meet my expectations. Where am I going wrong? Code is below.

 

Thank you for any pointers!

Gerit

 

* Calculating

  desweight = total_population_per_strata / sampling_per_strata

  within dataset "ratepk"

_total_ = total_population_per_strata

_rate_ = sampling_per_strata / _total_;

 

proc surveymeans data=respondents clm alpha=0.05 t rate=ratepk;

   by year;

   var growth_cost_per_empl;

   weight desweight ;

   strata finalstrata;

   ods output Statistics=outlier_with_t;

run;

data outlier; set outlier_with_t (drop t probt); run;

 

proc transpose data=outlier out = toutlier; by year; run;

 

data lower (drop= _name_), set toutlier (rename=(col1=ratelower));

   if _name_ = "LowerCLMean";

run;

data upper (drop= _name_), set toutlier (rename=(col1=rateupper));

   if _name_ = "UpperCLMean";

run;

 

data confidenceintervals; merge upper lower; by year; run;

 

data respondents; merge respondents confidenceintervals;

   by year;

   dummy = 0;

   if ratelower < growth_cost_per_empl and growth_cost_per_empl < rateupper then dummy = 1;

run;

 

proc sort data=respondents;

   by year dummy;

run;

 

data test; set respondents;

   by year dummy;

   retain testvariable;

   if first.dummy then testvariable = 0;

   testvariable + desweight;

   if last.dummy;

 

  keep year dummy testvariable;

run;

* And here I would expect that the sum of weights roughly 95% with dummy = 1 and roughly 5 % with dummy = 0;

* However, I have around 20 - 30 % with dummy = 1;

* graphically speaking it looks similar:;

ods graphics on;

proc ag-lot data= respondents;

   bubble x=year, y=growth_cost_per_empl size = desweight;

   series x=year y= ratelower;

   series x=year y= rateupper;

run;

3 REPLIES 3
ballardw
Super User

Without data we really can't tell what you may be "doing wrong".

When you say "However, my results don't meet my expectations". it would help to show 1) the results that don't match your expectations and 2) what your expectations were.

 

With such a high rate of non-response I might suggest looking at your approach to weights. You have what appears to be a design weight but you may need to adjust that for actual responses per stratum.

 

If everything from the data outlier step onwards is just to get a table to look at then you need to learn how to use some of the report procedures.

 

 

MsGeritO
Obsidian | Level 7
Thank you. I can add some data on Monday.

So far I haven't found any report procedures which give me the results from "data outlier" onwards. I am happy to receive any ideas on this.
ballardw
Super User

@MsGeritO wrote:
Thank you. I can add some data on Monday.

So far I haven't found any report procedures which give me the results from "data outlier" onwards. I am happy to receive any ideas on this.

Are you trying to identify the specific records that have outlier values for a specific response?

 

 

You may also want to request plots for the response variables.

 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1417 views
  • 0 likes
  • 2 in conversation