Hello,
I am new to sas programming. I am doing data analysis. I want to delete observations(not collumn or raw) and then in output i need full data set in which it should show deleted value as missing so i can impute those values for later use.I want to delete observations randomly based on percentage. can anyone help me with code ?
Here is a possible method-code but I dont know what is the preffered randon functio
in order to force the desired percantage:
%let n=5; /* number of observations-values per row */
%let p=0.05; /* desired percentage */
data want;
set test(drop=F) end=eof;
array obs {&n} _numeric_;
retain nmis 0;
do i=1 to &n;
if round(ranuni(100),0.01) = &p
then do; nmis+1; call missing(obs(i)); end;
end;
if eof then put nmis= " is the number of missing values created ";
DROP i nmis;
run;
1. CALL MISSING() can be used to set multiple variables to missing.
2. If you need an exact sample use PROC SURVEYSELECT and merge back with original dataset to identify selected records. Or use the following if approximate is ok.
if rand('Bernoulli', 0.5) =1 then call missing(of _all_);
It is better to post an example to expalin what you want .
Here is an example based on sashelp.class of how to select a random sample of 20% from each sex group for deletion
/* Sort by subgroups, if needed */
proc sort data=sashelp.class out=class; by sex; run;
/* Select sample in each subgroup */
proc surveyselect data=class out=classSample
samprate=0.2 outall;
strata sex; /* subgroups */
run;
/* Set variables to missing for selected obs */
data classSample;
set classSample;
if selected then call missing(weight, height);
drop selected selectionProb samplingWeight;
run;
note: you could generate 100 random samples by adding option reps=100 to the proc surveyselect statement. The variable replicate would then be added to the output data.
suppose this is original data=
A B C D E
4 5 8 10 9
3 5 2 6 8
8 9 3 5 7
9 3 5 6 2
in out put data should be in this way. so later stages I can impute them.
A B C D E
4 5 10 9
3 2 6 8
8 9 3 5 7
3 5 2
the data should be complete and there should be blank where we deleted values.
my original data is way bigger then this. please any help will be appreciated.
The term "observation" in sas means, ususally, a row in the dataset.
Your output has same number of observation (rows) as input.
You did not deleted observations but randomaly assign missing to variables.
Should the percentage be of observations (rows) or of values ? which I'm not sure is the same.
I want to delete random data values not the observations. and percentage should be depend on data values.if we think of 20%, out of 500 observations in every variable it should delete 100 data values. and out put should contain 500 observations in which deleted values should be blank or missing.
Here is a possible method-code but I dont know what is the preffered randon functio
in order to force the desired percantage:
%let n=5; /* number of observations-values per row */
%let p=0.05; /* desired percentage */
data want;
set test(drop=F) end=eof;
array obs {&n} _numeric_;
retain nmis 0;
do i=1 to &n;
if round(ranuni(100),0.01) = &p
then do; nmis+1; call missing(obs(i)); end;
end;
if eof then put nmis= " is the number of missing values created ";
DROP i nmis;
run;
@art297 You can use call Missing (of _all_) for both numeric and character.
@Shmuel's code modified
%let n=5; /* number of observations-values per row */
%let p=0.05; /* desired percentage */
data want;
set test(drop=F) end=eof;
retain nmis 0;
if round(ranuni(100),0.01) = &p
then do;
nmis+1;
call missing(of _all_);
end;
if eof then put nmis= " is the number of missing values created ";
DROP i nmis;
run;
Or from PGStats solution:
%let p=0.05;
proc sort data=sashelp.class out=class;
by sex;
run;
proc surveyselect data=class out=selected samprate=&p outall;
strata sex;
run;
data missing;
set selected;
if selected=1 then
call missing (of _all);
run;
@Reeza, using call missing(of _ALL_) - should it not enter missing values to all variables in the observation ?
Looking at wanted result of example, that is not the desired:
in out put data should be in this way. so later stages I can impute them.
A B C D E
4 5 10 9
3 2 6 8
8 9 3 5 7
3 5 2
It depends on what you mean by " delete a single observation" -
up to know you asked to enter missing value to "observations" randomly.
Do you mean to skip the missing value, still having same number of values per row (output observation) ?
If positive then:
%let n=5; /* number of observations-values per row */
%let p=0.05; /* desired percentage */
data want;
set test(drop=F) end=eof;
array obs {&n} _numeric_;
retain nmis 0;
do i=1 to &n;
if round(ranuni(100),0.01) = &p
then do;
nmis+1;
call missing(obs(i));
i = i - 1; /* line added - leave space for next value */
end;
end;
if eof then put nmis= " is the number of missing values created ";
DROP i nmis;
run;
@Shmuel types faster than I do but, if you need to delete both character and numeric (which I presume you do as you mentioned that the results should either be missing or blank), the method can be expanded to include both:
data want; set have; array nums(*) _numeric_; array chars(*) _character_; do _n_=1 to dim(nums); if rand("Uniform") lt .2 then call missing(nums(_n_)); end; do _n_=1 to dim(chars); if rand("Uniform") lt .2 then call missing(chars(_n_)); end; run;
Art, CEO, AnalystFinder.com
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.