04-11-2018 07:31 AM
I wonder could anyone provide some guidance for the following query, please?
I have a dataset for 10,000 households on income.
I would like to see the impact of an increase in income of ranging between 500-1000 for a random 200 households can have on headline figures such as the percentage of the sample at risk of poverty.
Would anyone know how the best way to write a command to increase a variable between 500-1000 for a random 200 observations?
Any help would be most welcome.
04-11-2018 08:16 AM
Well, you can use the random number generator:
data A; call streaminit(123); /* set random number seed */ do i = 1 to 200; u = rand("Uniform") * 10000; output; end; run;
This would give you 200 random observation numbers. Merge this to your data based on u=_nobs_ (will need an actual variable), then apply an if - note that this is just pseudocode, I haven't time to test anything right now:
data want; merge have (in=a) a (in=b);
if b then do;
/* set addition here */
04-11-2018 08:29 AM
This can be done in 1 step, but the statistical theory proving that this is in fact a random selection is complex.
set have nobs=denominator;
retain numerator 200;
if ranuni(12345) < numerator / denominator then do;
value = value + 500; /* how do you know how much to add?? */
numerator = numerator - 1;
denominator = denominator - 1;
04-11-2018 09:44 AM
Rick Wicklin discusses using random number generators in a number of places. Here's an SGF paper that discusses the topic.