Hello all,
I'm trying to randomly delete three observations for each group, and add the result as a new column. Does anyone know how to randomly delete three observations by group?
The dataset I have is like the following:
ID year prep
1 2000 550
1 2001 600
1 2002 580
1 2003 550
1 2004 570
... ... ...
1 2018 550
2 2000 600
2 2001 600
2 2002 580
2 2003 550
2 2004 570
... ... ...
and I'm trying to get the new dataset as:
ID year prep new
1 2000 550 550
1 2001 600 .
1 2002 580 580
1 2003 550 .
1 2004 570 570
1 2005 530 530
1 2006 545 .
... ... ... ...
1 2018 550 550
2 2000 600 600
2 2001 600 .
2 2002 580 .
2 2003 550 550
2 2004 570 570
... ... ... ...
Thank you all!
Best,
Hua
I recommend using PROC SURVEYSELECT. Seed values changes with time so that when ever you run the random changes.
data have;
prep=500;
do id=1 to 3;
do year=2000 to 2008;
prep+50;
output;
end;
end;
run;
proc surveyselect data=have
method=srs n=3
seed=%SYSFUNC(round(%Sysfunc(time())))
out=test (Keep=id year prep) noprint ;
strata id ;
run;
proc sql;
create table want as
select a.*,case when b.prep is null then a.prep else . end as new_prep
from have a
left join test b on (a.id=b.id and a.year=b.year);
quit;
Some idea here:
data have;
input ID year prep;
cards;
1 2000 550
1 2001 600
1 2002 580
1 2003 550
1 2004 570
1 2018 550
2 2000 600
2 2001 600
2 2002 580
2 2003 550
2 2004 570
;
data want;
do _n_=1 by 1 until(last.id);
set have;
by id;
end;
array t(3) _temporary_;
do _d=1 to 3;
t(_d)=rand('integer',1,_n_);
end;
do _n_=1 by 1 until(last.id);
set have;
by id;
if _n_ not in t then output;
end;
drop _:;
run;
@hua This version is a better one :
data want;
array t(3) _temporary_;
call missing(of t(*), __d);
do _n_=1 by 1 until(last.id);
set have;
by id;
end;
do while(n(of t(*))<dim(t));
_d=rand('integer',1,_n_);
if _d not in t then do;__d+1; t(__d)=_d;end;
end;
do _n_=1 by 1 until(last.id);
set have;
by id;
if _n_ not in t then output;
end;
drop _:;
run;
I recommend using PROC SURVEYSELECT. Seed values changes with time so that when ever you run the random changes.
data have;
prep=500;
do id=1 to 3;
do year=2000 to 2008;
prep+50;
output;
end;
end;
run;
proc surveyselect data=have
method=srs n=3
seed=%SYSFUNC(round(%Sysfunc(time())))
out=test (Keep=id year prep) noprint ;
strata id ;
run;
proc sql;
create table want as
select a.*,case when b.prep is null then a.prep else . end as new_prep
from have a
left join test b on (a.id=b.id and a.year=b.year);
quit;
That can be over come by simple assignment of
call streaminit(123);
data want;
call streaminit(123);
array t(3) _temporary_;
call missing(of t(*), __d);
do _n_=1 by 1 until(last.id);
set have;
by id;
end;
do while(n(of t(*))<dim(t));
_d=rand('integer',1,_n_);
if _d not in t then do;__d+1; t(__d)=_d;end;
end;
do _n_=1 by 1 until(last.id);
set have;
by id;
if _n_ not in t then output;
end;
drop _:;
run;
I bet this this much faster if the requirement is "randomly delete three observations by group"
Hmm Hang on I think I m missing something, There should not be a need merge back. I admit I haven't got the clarity of final output you want. Can you clarify, let me tweak the existing one plz
I think this change will do
if _n_ in t then call missing(new);
else new=prep;
data want;
call streaminit(123);
array t(3) _temporary_;
call missing(of t(*), __d);
do _n_=1 by 1 until(last.id);
set have;
by id;
end;
do while(n(of t(*))<dim(t));
_d=rand('integer',1,_n_);
if _d not in t then do;__d+1; t(__d)=_d;end;
end;
do _n_=1 by 1 until(last.id);
set have;
by id;
if _n_ in t then call missing(new);
else new=prep;
output;
end;
drop _:;
run;
Got it. Sure proc survey select is convenient and i tested that being much slower. I have modified mine for my satisfaction after all.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.