Hello all,
I'm trying to randomly delete three observations for each group, and add the result as a new column. Does anyone know how to randomly delete three observations by group?
The dataset I have is like the following:
ID year prep
1 2000 550
1 2001 600
1 2002 580
1 2003 550
1 2004 570
... ... ...
1 2018 550
2 2000 600
2 2001 600
2 2002 580
2 2003 550
2 2004 570
... ... ...
and I'm trying to get the new dataset as:
ID year prep new
1 2000 550 550
1 2001 600 .
1 2002 580 580
1 2003 550 .
1 2004 570 570
1 2005 530 530
1 2006 545 .
... ... ... ...
1 2018 550 550
2 2000 600 600
2 2001 600 .
2 2002 580 .
2 2003 550 550
2 2004 570 570
... ... ... ...
Thank you all!
Best,
Hua
I recommend using PROC SURVEYSELECT. Seed values changes with time so that when ever you run the random changes.
data have;
prep=500;
do id=1 to 3;
do year=2000 to 2008;
prep+50;
output;
end;
end;
run;
proc surveyselect data=have
method=srs n=3
seed=%SYSFUNC(round(%Sysfunc(time())))
out=test (Keep=id year prep) noprint ;
strata id ;
run;
proc sql;
create table want as
select a.*,case when b.prep is null then a.prep else . end as new_prep
from have a
left join test b on (a.id=b.id and a.year=b.year);
quit;
Some idea here:
data have;
input ID year prep;
cards;
1 2000 550
1 2001 600
1 2002 580
1 2003 550
1 2004 570
1 2018 550
2 2000 600
2 2001 600
2 2002 580
2 2003 550
2 2004 570
;
data want;
do _n_=1 by 1 until(last.id);
set have;
by id;
end;
array t(3) _temporary_;
do _d=1 to 3;
t(_d)=rand('integer',1,_n_);
end;
do _n_=1 by 1 until(last.id);
set have;
by id;
if _n_ not in t then output;
end;
drop _:;
run;
@hua This version is a better one :
data want;
array t(3) _temporary_;
call missing(of t(*), __d);
do _n_=1 by 1 until(last.id);
set have;
by id;
end;
do while(n(of t(*))<dim(t));
_d=rand('integer',1,_n_);
if _d not in t then do;__d+1; t(__d)=_d;end;
end;
do _n_=1 by 1 until(last.id);
set have;
by id;
if _n_ not in t then output;
end;
drop _:;
run;
I recommend using PROC SURVEYSELECT. Seed values changes with time so that when ever you run the random changes.
data have;
prep=500;
do id=1 to 3;
do year=2000 to 2008;
prep+50;
output;
end;
end;
run;
proc surveyselect data=have
method=srs n=3
seed=%SYSFUNC(round(%Sysfunc(time())))
out=test (Keep=id year prep) noprint ;
strata id ;
run;
proc sql;
create table want as
select a.*,case when b.prep is null then a.prep else . end as new_prep
from have a
left join test b on (a.id=b.id and a.year=b.year);
quit;
That can be over come by simple assignment of
call streaminit(123);
data want;
call streaminit(123);
array t(3) _temporary_;
call missing(of t(*), __d);
do _n_=1 by 1 until(last.id);
set have;
by id;
end;
do while(n(of t(*))<dim(t));
_d=rand('integer',1,_n_);
if _d not in t then do;__d+1; t(__d)=_d;end;
end;
do _n_=1 by 1 until(last.id);
set have;
by id;
if _n_ not in t then output;
end;
drop _:;
run;
I bet this this much faster if the requirement is "randomly delete three observations by group"
Hmm Hang on I think I m missing something, There should not be a need merge back. I admit I haven't got the clarity of final output you want. Can you clarify, let me tweak the existing one plz
I think this change will do
if _n_ in t then call missing(new);
else new=prep;
data want;
call streaminit(123);
array t(3) _temporary_;
call missing(of t(*), __d);
do _n_=1 by 1 until(last.id);
set have;
by id;
end;
do while(n(of t(*))<dim(t));
_d=rand('integer',1,_n_);
if _d not in t then do;__d+1; t(__d)=_d;end;
end;
do _n_=1 by 1 until(last.id);
set have;
by id;
if _n_ in t then call missing(new);
else new=prep;
output;
end;
drop _:;
run;
Got it. Sure proc survey select is convenient and i tested that being much slower. I have modified mine for my satisfaction after all.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.