- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
How to split a dataset 80 - 20 percent with common id i.e i want to split data by id(80-20percentage of data to be splitted on basis of id)
id score forum
12 89 98
12 87 67
13 56 87
13 45 98
14 78 98
15 23 87
16 54 23
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Given the example you provided, how do you want the end result to look like?
Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Presuming that you want all of the records associated with 80% of unique IDs to be identified:
data have;
input id score forum;
cards;
12 89 98
12 87 67
13 56 87
13 45 98
14 78 98
15 23 87
16 54 23
;
proc sql;
create table ids as select distinct id, 0 as id_rand_val from work.have order by id;
update ids set id_rand_val=rand('uniform');
create table want as
select
t1.*,
case when t2.id_rand_val <= .8 then 'Group1' else 'Group2' end as ID_Group
from
work.have t1
inner join work.ids t2
on t1.id=t2.id;
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
SELECTED=1 is the RATE= sample in this case the 20%. Therefore SELECTED=0 would be the 1-rate part.
input id $ score forum;
cards;
12 89 98
12 87 67
13 56 87
13 45 98
14 78 98
15 23 87
16 54 23
;;;;
run;
proc surveyselect seed=2 rate=.2 outall;
SAMPLINGUNIT id;
run;