Randomly delete variables in a by variable group

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 8
Accepted Solution

Randomly delete variables in a by variable group

I have a dataset with one variable used for by processing and one or more other variables used for analysis. So the table looks something like the below:

 

By_var   X1    X2   ...  Xn

1          234     4    ...   12

...            ...   ...    ...     ...

10          33     4     ...    4

 

 

Here's where things get weird:

I'd like to randomly select half of the x variables and fill them with a missing value. The x variables that get blanked out should be the same for every observation with the same by_var value, and I'd like this random slection of variables to be allowed to change with each by_var group. I'm really at a loss on where to even start this. Any ideas?

 


Accepted Solutions
Solution
‎09-26-2015 03:43 PM
Grand Advisor
Posts: 17,325

Re: Randomly delete variables in a by variable group

Something like the following:

 

1. Set up an array for current X variables

2. Set up partner array to assign 1/0 using bernoulli random with 50% chance so 50% are missing

3. Set partner array only when first of group var

4. Assign x values to missing based on partner array

 

Untested code below:

 

data want;

set have;

by group_var;

array x(20) x1-x20;

array x_blank(20) xb1-xb20;

retain xb:;

 

if first.group_var then do i=1 to 20;

*create a 1/0 variable with 50% chance;

xb(i)=rand('bernoulli', 0.5);

end;

 

do i=1 to 20;

if xb(i)=1 then x(i)=.;

end;

 

run;

 

 

 

 

View solution in original post


All Replies
Solution
‎09-26-2015 03:43 PM
Grand Advisor
Posts: 17,325

Re: Randomly delete variables in a by variable group

Something like the following:

 

1. Set up an array for current X variables

2. Set up partner array to assign 1/0 using bernoulli random with 50% chance so 50% are missing

3. Set partner array only when first of group var

4. Assign x values to missing based on partner array

 

Untested code below:

 

data want;

set have;

by group_var;

array x(20) x1-x20;

array x_blank(20) xb1-xb20;

retain xb:;

 

if first.group_var then do i=1 to 20;

*create a 1/0 variable with 50% chance;

xb(i)=rand('bernoulli', 0.5);

end;

 

do i=1 to 20;

if xb(i)=1 then x(i)=.;

end;

 

run;

 

 

 

 

Occasional Contributor
Posts: 8

Re: Randomly delete variables in a by variable group

[ Edited ]

This was a great place to start, Rezza! Thanks. I made a few changes and settled on the below. I have a user entered string of the variables they care about in the table, so I used that to make the arrays a little more dynamic.

 

%Macro Var_count;

%let var_count = %sysfunc(countw(&variables.));

%mend;

%Var_count;

 

data want (drop= i _: );

set have;

by group_var;

array x(*) &variables.;

array _xbl(*) _xb1-_xb&var_count.;

retain xb:;

call steaminit(321);

 

if first.group_var then do i=1 to &var_count;

*create a 1/0 variable with 50% chance;

xb(i)=rand('bernoulli', 0.5);

end;

 

do i=1 to &var_count;

if _xb(i)=1 then x(i)=.;

end;

 

run;

 

Thanks Again.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 2 replies
  • 327 views
  • 1 like
  • 2 in conversation