How to split a dataset 80 - 20 percent with common id i.e i want to split data by id(80-20percentage of data to be splitted on basis of id)
id score forum
12 89 98
12 87 67
13 56 87
13 45 98
14 78 98
15 23 87
16 54 23
Hi,
Given the example you provided, how do you want the end result to look like?
Thanks.
Presuming that you want all of the records associated with 80% of unique IDs to be identified:
data have;
input id score forum;
cards;
12 89 98
12 87 67
13 56 87
13 45 98
14 78 98
15 23 87
16 54 23
;
proc sql;
create table ids as select distinct id, 0 as id_rand_val from work.have order by id;
update ids set id_rand_val=rand('uniform');
create table want as
select
t1.*,
case when t2.id_rand_val <= .8 then 'Group1' else 'Group2' end as ID_Group
from
work.have t1
inner join work.ids t2
on t1.id=t2.id;
quit;
SELECTED=1 is the RATE= sample in this case the 20%. Therefore SELECTED=0 would be the 1-rate part.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.