BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
hua
Obsidian | Level 7 hua
Obsidian | Level 7

Hello all,

 

I'm trying to randomly delete three observations for each group, and add the result as a new column. Does anyone know how to randomly delete three observations by group?

The dataset I have is like the following:

ID     year     prep

1      2000       550

1      2001       600

1      2002       580

1      2003       550

1      2004       570

...       ...           ...

1      2018       550

2      2000       600

2      2001       600

2      2002       580

2      2003       550

2      2004       570

...       ...           ...

 

and I'm trying to get the new dataset as:

ID     year       prep     new

1      2000       550      550

1      2001       600         .

1      2002       580      580

1      2003       550        .

1      2004       570      570

1      2005       530      530

1      2006       545        .

...       ...           ...        ...

1      2018       550       550

2      2000       600       600

2      2001       600          .

2      2002       580          .

2      2003       550       550

2      2004       570       570

...       ...           ...         ...

 

Thank you all!

 

Best,

Hua

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
SuryaKiran
Meteorite | Level 14

I recommend using PROC SURVEYSELECT. Seed values changes with time so that when ever you run the random changes.

 

data have;
prep=500;
do id=1 to 3;
	do year=2000 to 2008;
	prep+50;
	output;
	end;
end;
run;

proc surveyselect data=have
	method=srs n=3 
	seed=%SYSFUNC(round(%Sysfunc(time())))  
	out=test (Keep=id year prep) noprint ;
strata id ;
run;

proc sql;
create table want as 
select a.*,case when b.prep is null then a.prep else . end as new_prep
	from have a
	left join test b on (a.id=b.id and a.year=b.year);
quit;
Thanks,
Suryakiran

View solution in original post

10 REPLIES 10
Reeza
Super User
1. Add a random number
2. sort by random number
3. Set to missing for the first 3 (or reassign as needed).
4. Resort to desired order

I don't know of another quick way, except possibly using PROC SURVEYSELECT and select N-3 from each group and then merging that back in.
hua
Obsidian | Level 7 hua
Obsidian | Level 7
Thanks! It's a good way to do it!
novinosrin
Tourmaline | Level 20

Some idea here:

 

data have;
input ID     year     prep;
cards;
1      2000       550
1      2001       600
1      2002       580
1      2003       550
1      2004       570
1      2018       550
2      2000       600
2      2001       600
2      2002       580
2      2003       550
2      2004       570
;
data want;
do _n_=1 by 1 until(last.id);
set have;
by id;
end;
array t(3) _temporary_;
do _d=1 to 3;
t(_d)=rand('integer',1,_n_);
end;
do _n_=1 by 1 until(last.id);
set have;
by id;
if _n_ not in t then output;
end;
drop _:;
run;


novinosrin
Tourmaline | Level 20

@hua  This version is  a better one :

 


data want;
array t(3) _temporary_;
call missing(of t(*), __d);
do _n_=1 by 1 until(last.id);
set have;
by id;
end;
do while(n(of t(*))<dim(t));
_d=rand('integer',1,_n_);
if _d not in t then do;__d+1; t(__d)=_d;end;
end;
do _n_=1 by 1 until(last.id);
set have;
by id;
if _n_ not in t then output;
end;
drop _:;
run;
SuryaKiran
Meteorite | Level 14

I recommend using PROC SURVEYSELECT. Seed values changes with time so that when ever you run the random changes.

 

data have;
prep=500;
do id=1 to 3;
	do year=2000 to 2008;
	prep+50;
	output;
	end;
end;
run;

proc surveyselect data=have
	method=srs n=3 
	seed=%SYSFUNC(round(%Sysfunc(time())))  
	out=test (Keep=id year prep) noprint ;
strata id ;
run;

proc sql;
create table want as 
select a.*,case when b.prep is null then a.prep else . end as new_prep
	from have a
	left join test b on (a.id=b.id and a.year=b.year);
quit;
Thanks,
Suryakiran
novinosrin
Tourmaline | Level 20

That can be over come by simple assignment of 

call streaminit(123);

 

 

data want;
call streaminit(123);
array t(3) _temporary_;
call missing(of t(*), __d);
do _n_=1 by 1 until(last.id);
set have;
by id;
end;
do while(n(of t(*))<dim(t));
_d=rand('integer',1,_n_);
if _d not in t then do;__d+1; t(__d)=_d;end;
end;
do _n_=1 by 1 until(last.id);
set have;
by id;
if _n_ not in t then output;
end;
drop _:;
run;

I bet this this much faster if the requirement is "randomly delete three observations by group"

hua
Obsidian | Level 7 hua
Obsidian | Level 7
Yes, it works for randomly deleting 3 obs, all I need to do next is to merge the result to the original dataset. Thank you!
novinosrin
Tourmaline | Level 20

Hmm Hang on I think I m missing something, There should not be a need merge back. I admit I haven't got the clarity of final output you want. Can you clarify, let me tweak the existing one plz

 

I think this change will do

if _n_ in t then call missing(new);
else new=prep;

data want;
call streaminit(123);
array t(3) _temporary_;
call missing(of t(*), __d);
do _n_=1 by 1 until(last.id);
set have;
by id;
end;
do while(n(of t(*))<dim(t));
_d=rand('integer',1,_n_);
if _d not in t then do;__d+1; t(__d)=_d;end;
end;
do _n_=1 by 1 until(last.id);
set have;
by id;
if _n_  in t then call missing(new);
else new=prep;
output;
end;
drop _:;
run;
hua
Obsidian | Level 7 hua
Obsidian | Level 7
The randomly deleting part words well. The final output I want is to add a new column, which is the result after randomly deleting 3 obs, to the original dataset. SuryaKiran's code works well, which randomly selected 3 obs and then merge them as missing value to the original one.
novinosrin
Tourmaline | Level 20

Got it. Sure proc survey select is convenient and i tested that being much slower. I have modified mine for my satisfaction after all. 

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 10 replies
  • 2882 views
  • 1 like
  • 4 in conversation