BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
chintanpatel
Calcite | Level 5

Hello,

I am new to sas programming. I am doing data analysis. I want to delete observations(not collumn or raw) and then in output i need full data set in which it should show deleted value as missing so i can impute those values for later use.I want to delete observations randomly based on percentage. can anyone help me with code ?

1 ACCEPTED SOLUTION

Accepted Solutions
Shmuel
Garnet | Level 18

Here is a possible method-code but I dont know what is the preffered randon functio

in order to force the desired percantage:

 

%let n=5; /* number of observations-values per row */
%let p=0.05; /* desired percentage */
data want;
 set test(drop=F) end=eof;
     array obs {&n} _numeric_;
     retain nmis 0;

     do i=1 to &n;
        if round(ranuni(100),0.01) = &p
        then do; nmis+1; call missing(obs(i)); end;
     end;
if eof then put nmis= " is the number of missing values created "; DROP i nmis; run;

View solution in original post

17 REPLIES 17
Reeza
Super User

1. CALL MISSING() can be used to set multiple variables to missing. 

2. If you need an exact sample use PROC SURVEYSELECT and merge back with original dataset to identify selected records. Or use the following if approximate is ok. 

 

if rand('Bernoulli', 0.5) =1 then call missing(of _all_); 

Ksharp
Super User

It is better to post an example to expalin what you want .

PGStats
Opal | Level 21

Here is an example based on sashelp.class of how to select a random sample of 20% from each sex group for deletion

 

/* Sort by subgroups, if needed */
proc sort data=sashelp.class out=class; by sex; run;

/* Select sample in each subgroup */
proc surveyselect data=class out=classSample
    samprate=0.2 outall;
strata sex; /* subgroups */
run;

/* Set variables to missing for selected obs */
data classSample;
set classSample;
if selected then call missing(weight, height);
drop selected selectionProb samplingWeight;
run;

note: you could generate 100 random samples by adding option reps=100 to the proc surveyselect statement. The variable replicate would then be added to the output data.

PG
chintanpatel
Calcite | Level 5

suppose this is original data=

A B C D E

4 5 8 10 9

3 5 2  6  8

8 9 3  5  7

9 3 5  6  2

 

in out put data should be in this way. so later stages I can impute them.

A B C D E

4 5    10 9

3     2 6  8

8 9  3 5  7

   3  5     2

the data should be complete and there should be blank where we deleted values.

my original data is way bigger then this. please any help will be appreciated.

Shmuel
Garnet | Level 18

The term "observation" in sas means, ususally, a row in the dataset.

Your output has same number of observation (rows) as input.

You did not deleted observations but randomaly assign missing to variables.

 

Should the percentage be of observations (rows) or of values ? which I'm not sure is the same.

 

 

chintanpatel
Calcite | Level 5

I want to delete random data values not the observations. and percentage should be depend on data values.if we think of 20%, out of 500 observations in every variable it should delete 100 data values. and out put should contain 500 observations in which deleted values should be blank or missing.

Shmuel
Garnet | Level 18

Here is a possible method-code but I dont know what is the preffered randon functio

in order to force the desired percantage:

 

%let n=5; /* number of observations-values per row */
%let p=0.05; /* desired percentage */
data want;
 set test(drop=F) end=eof;
     array obs {&n} _numeric_;
     retain nmis 0;

     do i=1 to &n;
        if round(ranuni(100),0.01) = &p
        then do; nmis+1; call missing(obs(i)); end;
     end;
if eof then put nmis= " is the number of missing values created "; DROP i nmis; run;
Reeza
Super User

@art297 You can use call Missing (of _all_) for both numeric and character. 

 

@Shmuel's code modified

 

%let n=5; /* number of observations-values per row */
%let p=0.05; /* desired percentage */
data want;
 set test(drop=F) end=eof;
     retain nmis 0;


        if round(ranuni(100),0.01) = &p
        then do; 
                nmis+1; 
               call missing(of _all_);
        end;
       
        if eof then put nmis= " is the number of missing values created ";
     DROP i nmis;
run;

Or from PGStats solution:

 

%let p=0.05;

proc sort data=sashelp.class out=class;
    by sex;
run;

proc surveyselect data=class out=selected samprate=&p outall;
    strata sex;
run;

data missing;
    set selected;

    if selected=1 then
        call missing (of _all);
run;
Shmuel
Garnet | Level 18

@Reeza, using call missing(of _ALL_) - should it not enter missing values to all variables in the observation ?

 

Looking at wanted result of example, that is not the desired:

in out put data should be in this way. so later stages I can impute them.
A B C  D E
4 5   10 9
3 2 6  8
8 9 3  5 7
  3 5    2
art297
Opal | Level 21

@Reeza: I may be missing something, but @Shmuel and I suggested code for deleting the values in a random selection of 20% of the cells, not deleting all values for 20% of the cases.  As such, I don't think call missing(of _all_) would be applicable.

 

Reeza
Super User

@Shmuel and @art297 You're correct, my mistake! 

 

makojoyce
Calcite | Level 5
is it possible to delete a single observation? not the whole row? thanks!
Shmuel
Garnet | Level 18

It depends on what you mean by " delete a single observation" -

up to know you asked to enter missing value to "observations" randomly.

Do you mean to skip the missing value, still having same number of values per row (output observation) ?

If positive then:

%let n=5; /* number of observations-values per row */
%let p=0.05; /* desired percentage */
data want;
 set test(drop=F) end=eof;
     array obs {&n} _numeric_;
     retain nmis 0;

     do i=1 to &n;
        if round(ranuni(100),0.01) = &p
        then do; 
               nmis+1; 
               call missing(obs(i)); 
               i = i - 1;   /* line added - leave space for next value */
        end;
     end;    
     if eof then put nmis= " is the number of missing values created ";
     DROP i nmis;
run;
art297
Opal | Level 21

@Shmuel types faster than I do but, if you need to delete both character and numeric (which I presume you do as you mentioned that the results should either be missing or blank), the method can be expanded to include both:

 

data want;
  set have;
  array nums(*) _numeric_;
  array chars(*) _character_;
  do _n_=1 to dim(nums);
    if rand("Uniform") lt .2 then call missing(nums(_n_));
  end;
  do _n_=1 to dim(chars);
    if rand("Uniform") lt .2 then call missing(chars(_n_));
  end;
run;

Art, CEO, AnalystFinder.com

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 17 replies
  • 3266 views
  • 2 likes
  • 7 in conversation