DATA Step, Macro, Functions and more

Delete observations randomly based on percentage and impute them

Accepted Solution Solved
Reply
Contributor
Posts: 21
Accepted Solution

Delete observations randomly based on percentage and impute them

Hello,

I am new to sas programming. I am doing data analysis. I want to delete observations(not collumn or raw) and then in output i need full data set in which it should show deleted value as missing so i can impute those values for later use.I want to delete observations randomly based on percentage. can anyone help me with code ?


Accepted Solutions
Solution
‎02-25-2017 02:40 PM
Trusted Advisor
Posts: 1,831

Re: Delete observations randomly based on percentage and impute them

Posted in reply to chintanpatel

Here is a possible method-code but I dont know what is the preffered randon functio

in order to force the desired percantage:

 

%let n=5; /* number of observations-values per row */
%let p=0.05; /* desired percentage */
data want;
 set test(drop=F) end=eof;
     array obs {&n} _numeric_;
     retain nmis 0;

     do i=1 to &n;
        if round(ranuni(100),0.01) = &p
        then do; nmis+1; call missing(obs(i)); end;
     end;
if eof then put nmis= " is the number of missing values created "; DROP i nmis; run;

View solution in original post


All Replies
Super User
Posts: 23,235

Re: Delete observations randomly based on percentage and impute them

Posted in reply to chintanpatel

1. CALL MISSING() can be used to set multiple variables to missing. 

2. If you need an exact sample use PROC SURVEYSELECT and merge back with original dataset to identify selected records. Or use the following if approximate is ok. 

 

if rand('Bernoulli', 0.5) =1 then call missing(of _all_); 

Super User
Posts: 10,681

Re: Delete observations randomly based on percentage and impute them

Posted in reply to chintanpatel

It is better to post an example to expalin what you want .

Esteemed Advisor
Posts: 5,474

Re: Delete observations randomly based on percentage and impute them

[ Edited ]
Posted in reply to chintanpatel

Here is an example based on sashelp.class of how to select a random sample of 20% from each sex group for deletion

 

/* Sort by subgroups, if needed */
proc sort data=sashelp.class out=class; by sex; run;

/* Select sample in each subgroup */
proc surveyselect data=class out=classSample
    samprate=0.2 outall;
strata sex; /* subgroups */
run;

/* Set variables to missing for selected obs */
data classSample;
set classSample;
if selected then call missing(weight, height);
drop selected selectionProb samplingWeight;
run;

note: you could generate 100 random samples by adding option reps=100 to the proc surveyselect statement. The variable replicate would then be added to the output data.

PG
Contributor
Posts: 21

Re: Delete observations randomly based on percentage and impute them

Posted in reply to chintanpatel

suppose this is original data=

A B C D E

4 5 8 10 9

3 5 2  6  8

8 9 3  5  7

9 3 5  6  2

 

in out put data should be in this way. so later stages I can impute them.

A B C D E

4 5    10 9

3     2 6  8

8 9  3 5  7

   3  5     2

the data should be complete and there should be blank where we deleted values.

my original data is way bigger then this. please any help will be appreciated.

Trusted Advisor
Posts: 1,831

Re: Delete observations randomly based on percentage and impute them

Posted in reply to chintanpatel

The term "observation" in sas means, ususally, a row in the dataset.

Your output has same number of observation (rows) as input.

You did not deleted observations but randomaly assign missing to variables.

 

Should the percentage be of observations (rows) or of values ? which I'm not sure is the same.

 

 

Contributor
Posts: 21

Re: Delete observations randomly based on percentage and impute them

I want to delete random data values not the observations. and percentage should be depend on data values.if we think of 20%, out of 500 observations in every variable it should delete 100 data values. and out put should contain 500 observations in which deleted values should be blank or missing.

Solution
‎02-25-2017 02:40 PM
Trusted Advisor
Posts: 1,831

Re: Delete observations randomly based on percentage and impute them

Posted in reply to chintanpatel

Here is a possible method-code but I dont know what is the preffered randon functio

in order to force the desired percantage:

 

%let n=5; /* number of observations-values per row */
%let p=0.05; /* desired percentage */
data want;
 set test(drop=F) end=eof;
     array obs {&n} _numeric_;
     retain nmis 0;

     do i=1 to &n;
        if round(ranuni(100),0.01) = &p
        then do; nmis+1; call missing(obs(i)); end;
     end;
if eof then put nmis= " is the number of missing values created "; DROP i nmis; run;
Super User
Posts: 23,235

Re: Delete observations randomly based on percentage and impute them

@art297 You can use call Missing (of _all_) for both numeric and character. 

 

@Shmuel's code modified

 

%let n=5; /* number of observations-values per row */
%let p=0.05; /* desired percentage */
data want;
 set test(drop=F) end=eof;
     retain nmis 0;


        if round(ranuni(100),0.01) = &p
        then do; 
                nmis+1; 
               call missing(of _all_);
        end;
       
        if eof then put nmis= " is the number of missing values created ";
     DROP i nmis;
run;

Or from PGStats solution:

 

%let p=0.05;

proc sort data=sashelp.class out=class;
    by sex;
run;

proc surveyselect data=class out=selected samprate=&p outall;
    strata sex;
run;

data missing;
    set selected;

    if selected=1 then
        call missing (of _all);
run;
Trusted Advisor
Posts: 1,831

Re: Delete observations randomly based on percentage and impute them

@Reeza, using call missing(of _ALL_) - should it not enter missing values to all variables in the observation ?

 

Looking at wanted result of example, that is not the desired:

in out put data should be in this way. so later stages I can impute them.
A B C  D E
4 5   10 9
3 2 6  8
8 9 3  5 7
  3 5    2
PROC Star
Posts: 8,145

Re: Delete observations randomly based on percentage and impute them

@Reeza: I may be missing something, but @Shmuel and I suggested code for deleting the values in a random selection of 20% of the cells, not deleting all values for 20% of the cases.  As such, I don't think call missing(of _all_) would be applicable.

 

Super User
Posts: 23,235

Re: Delete observations randomly based on percentage and impute them

@Shmuel and @art297 You're correct, my mistake! 

 

Regular Learner
Posts: 1

Re: Delete observations randomly based on percentage and impute them

is it possible to delete a single observation? not the whole row? thanks!
Trusted Advisor
Posts: 1,831

Re: Delete observations randomly based on percentage and impute them

Posted in reply to makojoyce

It depends on what you mean by " delete a single observation" -

up to know you asked to enter missing value to "observations" randomly.

Do you mean to skip the missing value, still having same number of values per row (output observation) ?

If positive then:

%let n=5; /* number of observations-values per row */
%let p=0.05; /* desired percentage */
data want;
 set test(drop=F) end=eof;
     array obs {&n} _numeric_;
     retain nmis 0;

     do i=1 to &n;
        if round(ranuni(100),0.01) = &p
        then do; 
               nmis+1; 
               call missing(obs(i)); 
               i = i - 1;   /* line added - leave space for next value */
        end;
     end;    
     if eof then put nmis= " is the number of missing values created ";
     DROP i nmis;
run;
PROC Star
Posts: 8,145

Re: Delete observations randomly based on percentage and impute them

Posted in reply to chintanpatel

@Shmuel types faster than I do but, if you need to delete both character and numeric (which I presume you do as you mentioned that the results should either be missing or blank), the method can be expanded to include both:

 

data want;
  set have;
  array nums(*) _numeric_;
  array chars(*) _character_;
  do _n_=1 to dim(nums);
    if rand("Uniform") lt .2 then call missing(nums(_n_));
  end;
  do _n_=1 to dim(chars);
    if rand("Uniform") lt .2 then call missing(chars(_n_));
  end;
run;

Art, CEO, AnalystFinder.com

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 17 replies
  • 732 views
  • 2 likes
  • 7 in conversation