BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
hua
Obsidian | Level 7 hua
Obsidian | Level 7

Hello all,

 

I'm trying to randomly delete three observations for each group, and add the result as a new column. Does anyone know how to randomly delete three observations by group?

The dataset I have is like the following:

ID     year     prep

1      2000       550

1      2001       600

1      2002       580

1      2003       550

1      2004       570

...       ...           ...

1      2018       550

2      2000       600

2      2001       600

2      2002       580

2      2003       550

2      2004       570

...       ...           ...

 

and I'm trying to get the new dataset as:

ID     year       prep     new

1      2000       550      550

1      2001       600         .

1      2002       580      580

1      2003       550        .

1      2004       570      570

1      2005       530      530

1      2006       545        .

...       ...           ...        ...

1      2018       550       550

2      2000       600       600

2      2001       600          .

2      2002       580          .

2      2003       550       550

2      2004       570       570

...       ...           ...         ...

 

Thank you all!

 

Best,

Hua

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
SuryaKiran
Meteorite | Level 14

I recommend using PROC SURVEYSELECT. Seed values changes with time so that when ever you run the random changes.

 

data have;
prep=500;
do id=1 to 3;
	do year=2000 to 2008;
	prep+50;
	output;
	end;
end;
run;

proc surveyselect data=have
	method=srs n=3 
	seed=%SYSFUNC(round(%Sysfunc(time())))  
	out=test (Keep=id year prep) noprint ;
strata id ;
run;

proc sql;
create table want as 
select a.*,case when b.prep is null then a.prep else . end as new_prep
	from have a
	left join test b on (a.id=b.id and a.year=b.year);
quit;
Thanks,
Suryakiran

View solution in original post

10 REPLIES 10
Reeza
Super User
1. Add a random number
2. sort by random number
3. Set to missing for the first 3 (or reassign as needed).
4. Resort to desired order

I don't know of another quick way, except possibly using PROC SURVEYSELECT and select N-3 from each group and then merging that back in.
hua
Obsidian | Level 7 hua
Obsidian | Level 7
Thanks! It's a good way to do it!
novinosrin
Tourmaline | Level 20

Some idea here:

 

data have;
input ID     year     prep;
cards;
1      2000       550
1      2001       600
1      2002       580
1      2003       550
1      2004       570
1      2018       550
2      2000       600
2      2001       600
2      2002       580
2      2003       550
2      2004       570
;
data want;
do _n_=1 by 1 until(last.id);
set have;
by id;
end;
array t(3) _temporary_;
do _d=1 to 3;
t(_d)=rand('integer',1,_n_);
end;
do _n_=1 by 1 until(last.id);
set have;
by id;
if _n_ not in t then output;
end;
drop _:;
run;


novinosrin
Tourmaline | Level 20

@hua  This version is  a better one :

 


data want;
array t(3) _temporary_;
call missing(of t(*), __d);
do _n_=1 by 1 until(last.id);
set have;
by id;
end;
do while(n(of t(*))<dim(t));
_d=rand('integer',1,_n_);
if _d not in t then do;__d+1; t(__d)=_d;end;
end;
do _n_=1 by 1 until(last.id);
set have;
by id;
if _n_ not in t then output;
end;
drop _:;
run;
SuryaKiran
Meteorite | Level 14

I recommend using PROC SURVEYSELECT. Seed values changes with time so that when ever you run the random changes.

 

data have;
prep=500;
do id=1 to 3;
	do year=2000 to 2008;
	prep+50;
	output;
	end;
end;
run;

proc surveyselect data=have
	method=srs n=3 
	seed=%SYSFUNC(round(%Sysfunc(time())))  
	out=test (Keep=id year prep) noprint ;
strata id ;
run;

proc sql;
create table want as 
select a.*,case when b.prep is null then a.prep else . end as new_prep
	from have a
	left join test b on (a.id=b.id and a.year=b.year);
quit;
Thanks,
Suryakiran
novinosrin
Tourmaline | Level 20

That can be over come by simple assignment of 

call streaminit(123);

 

 

data want;
call streaminit(123);
array t(3) _temporary_;
call missing(of t(*), __d);
do _n_=1 by 1 until(last.id);
set have;
by id;
end;
do while(n(of t(*))<dim(t));
_d=rand('integer',1,_n_);
if _d not in t then do;__d+1; t(__d)=_d;end;
end;
do _n_=1 by 1 until(last.id);
set have;
by id;
if _n_ not in t then output;
end;
drop _:;
run;

I bet this this much faster if the requirement is "randomly delete three observations by group"

hua
Obsidian | Level 7 hua
Obsidian | Level 7
Yes, it works for randomly deleting 3 obs, all I need to do next is to merge the result to the original dataset. Thank you!
novinosrin
Tourmaline | Level 20

Hmm Hang on I think I m missing something, There should not be a need merge back. I admit I haven't got the clarity of final output you want. Can you clarify, let me tweak the existing one plz

 

I think this change will do

if _n_ in t then call missing(new);
else new=prep;

data want;
call streaminit(123);
array t(3) _temporary_;
call missing(of t(*), __d);
do _n_=1 by 1 until(last.id);
set have;
by id;
end;
do while(n(of t(*))<dim(t));
_d=rand('integer',1,_n_);
if _d not in t then do;__d+1; t(__d)=_d;end;
end;
do _n_=1 by 1 until(last.id);
set have;
by id;
if _n_  in t then call missing(new);
else new=prep;
output;
end;
drop _:;
run;
hua
Obsidian | Level 7 hua
Obsidian | Level 7
The randomly deleting part words well. The final output I want is to add a new column, which is the result after randomly deleting 3 obs, to the original dataset. SuryaKiran's code works well, which randomly selected 3 obs and then merge them as missing value to the original one.
novinosrin
Tourmaline | Level 20

Got it. Sure proc survey select is convenient and i tested that being much slower. I have modified mine for my satisfaction after all. 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 10 replies
  • 3753 views
  • 1 like
  • 4 in conversation