Splitting Datasets in Two diff ways

Reply
Contributor LRN
Contributor
Posts: 57

Splitting Datasets in Two diff ways

Hi Everyone,

I have two datasets. I have 6 groups. First dataset will be divided between the first three. The second should be divided between the six groups (subtracting whatever is in the first 3 group). So the final 6 datasets should have the same number of records. How can I do that?

Thank you so much

Trusted Advisor
Posts: 1,909

Re: Splitting Datasets in Two diff ways

As much as I would like to help, I don't think this is clear enough for me to offer any suggestions.

Perhaps you could create a small example, or describe things in much more detail.

Contributor LRN
Contributor
Posts: 57

Re: Splitting Datasets in Two diff ways

Posted in reply to PaigeMiller

Hi,

For example

1. I have two datasets. Dataset 1 has 75 records. Dataset 2 has 275 records.

2.Finally I need 6 datasets.(ds1, ds2, ds3, ds4, ds5, ds6)

3. Dataset1 needs to be split into ds1, ds2, ds3

4. dataset2 needs to be split into ds1, ds2, ds3, ds4, ds5, ds6. (but ds1, ds2, ds3 has datas already by dataset1 split).

    Now the ds1, ds2, ds3 has 25 records in it. During dataset2 split, 50 records needs to be appended to ds1, ds2, ds3, and ds4 ds5 ds6 should be created with 75 records in each, so that all 6 datasets would have equal number of records in it.

I hope I explained in an understandable way. Thanks

Trusted Advisor
Posts: 1,909

Re: Splitting Datasets in Two diff ways

How would you choose the records that go into each output dataset? Randomly, or otherwise?

Contributor LRN
Contributor
Posts: 57

Re: Splitting Datasets in Two diff ways

Posted in reply to PaigeMiller

Yes. Randomly.

Trusted Advisor
Posts: 1,909

Re: Splitting Datasets in Two diff ways

1. Split first dataset into three

data data1;

     set data1;

     r=ranuni(0);

run;

proc sort data=data1;

     by r;

run;

data ds1 ds2 ds3;

     set data1;

    if _n_<=25 then output ds1;

    else if _n_<=50 then output ds2;

    else output ds3;

run;

2. Repeat the process with dataset2, modifying split numbers as appropriate; create ds1a ds2a ds3a ds4 ds5 ds6


3. Append ds1a to ds1; ds2a to ds2; ds3a to ds3

Contributor LRN
Contributor
Posts: 57

Re: Splitting Datasets in Two diff ways

Posted in reply to PaigeMiller

Actually the no.of records i just gave for example. it is not a fixed number. suppose if i have 628 records in the first dataset and 1103 in the second dataset how can i modify this?

Trusted Advisor
Posts: 1,909

Re: Splitting Datasets in Two diff ways

I think you might want to use a macro variable in this case

data data1;

     set data1 end=eof;

     r=ranuni(0);

     if eof then call symputx('numobs1',_n_);

run;

proc sort data=data1;

     by r;

run;

data ds1 ds2 ds3;

     set data1;

    if _n_<=(&numobs1 / 3) then output ds1;

    else if _n_<=(2 * &numobs1 / 3) then output ds2;

    else output ds3;

run;

and similarly for data set 2

Contributor LRN
Contributor
Posts: 57

Re: Splitting Datasets in Two diff ways

Thank you so much. I got it.

Ask a Question
Discussion stats
  • 8 replies
  • 262 views
  • 1 like
  • 2 in conversation