Solved: Re: split data into different sizes

statz · Posted 11-06-2015 04:05 PM

Hi,

I am struggling in creating a macro to split a data into different sizes. For example, if I have 84 observations in my current data, and I want to split this into 4 datasets with sizes (10,20,30,24).

For example, I have the following

data new;
do i =1 to 84;
output;
end;
run;

how do i get the following datasets?

dataset1: i=1,2,3,4,5,6,7,8,9,10

dataset2: i=11,12,13,14,15,16,17,18,19,20, ...... 30

dataset3: i=31,32,33,34,..........................., 60

dataset4: i=61,62,63,.......... 84

and then for each dataset I output the following: mean,sd, histogram

Im thinking of creating a macro since the number of datasets to create and the sizes will change for every different original data....

Thanks..

PGStats · Posted 11-06-2015 05:19 PM

To follow @Astounding's suggestion, you should do something like this:

data new;
do i =1 to 84;
    output;
    end;
run;

%macro mySplit(dsn,sizes);
data split;
do s = &sizes.;
    set+1;
    do j = 1 to s;
        set &dsn;
        output;
        end;
    end;
drop s j;
run;

ods graphics / imagename="&dsn._graph";
proc univariate data=split;
by set;
var i;
histogram;
output out=out_&dsn. mean=mi std=stdi;
run;
%mend mySplit;

%mySplit(new,%str(10,20,30,24));

PG

View solution in original post

Astounding · Posted 11-06-2015 04:19 PM

Most likely, the best advice would be this: Don't do it! Instead of splitting up the data, just add a new variable to your existing data set. The new variable could be "1" for the first 10 observations, "2" for the next 20, "3" for the next 30, etc.

You can always process with a BY statement later to get statistics for each group, or possibly with a WHERE statement to select just a single group.

You'll save a lot of headaches trying to come up with data set names and tracking which is which.

Good luck.

PGStats · Posted 11-06-2015 05:19 PM

To follow @Astounding's suggestion, you should do something like this:

data new;
do i =1 to 84;
    output;
    end;
run;

%macro mySplit(dsn,sizes);
data split;
do s = &sizes.;
    set+1;
    do j = 1 to s;
        set &dsn;
        output;
        end;
    end;
drop s j;
run;

ods graphics / imagename="&dsn._graph";
proc univariate data=split;
by set;
var i;
histogram;
output out=out_&dsn. mean=mi std=stdi;
run;
%mend mySplit;

%mySplit(new,%str(10,20,30,24));

PG

statz · Posted 11-09-2015 08:22 AM

Thank you PG Stats! this is perfect! 🙂

LinusH · Posted 11-07-2015 02:24 AM

Kinda like @Astounding said, but don't do this.
A data set with 84 variables is unlikely normalized. If you normalize you will get a robust structure that seldom needs to be changed. Also you minimize the maintenance of having variable specific code and avoiding the need for macro coding.

Data never sleeps

statz · Posted 11-09-2015 08:26 AM

Thanks... my main goal is to check if the data for each segment is normally distributed.... and in reality, I may have different number of observations. If the number of observations is small, then I might have only one or 2 segments.... or when the data is large, i may have many segments. Thanks!

split data into different sizes

Re: split data into different sizes

Re: split data into different sizes

Re: split data into different sizes

Re: split data into different sizes

Re: split data into different sizes

Re: split data into different sizes

Registration is open