SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

split data into different sizes

Accepted Solution Solved
Reply
Contributor
Posts: 33
Accepted Solution

split data into different sizes

Hi, 

 

I am struggling in creating a macro to split a data into different sizes. For example, if I have 84 observations in my current data, and I want to split this into 4 datasets with sizes (10,20,30,24).

 

For example, I have the following

 

data new;
do i =1 to 84;
output;
end;
run;

 

how do i get the following datasets?

dataset1: i=1,2,3,4,5,6,7,8,9,10

dataset2: i=11,12,13,14,15,16,17,18,19,20, ...... 30

dataset3: i=31,32,33,34,..........................., 60

dataset4: i=61,62,63,.......... 84

 

and then for each dataset I output the following: mean,sd, histogram

 

Im thinking of creating a macro since the number of datasets to create and the sizes will change for every different original data....

 

Thanks..

 

 

 

 

 


Accepted Solutions
Solution
‎11-09-2015 08:22 AM
Respected Advisor
Posts: 4,640

Re: split data into different sizes

To follow @Astounding's suggestion, you should do something like this:

 

data new;
do i =1 to 84;
    output;
    end;
run;

%macro mySplit(dsn,sizes);
data split;
do s = &sizes.;
    set+1;
    do j = 1 to s;
        set &dsn;
        output;
        end;
    end;
drop s j;
run;

ods graphics / imagename="&dsn._graph";
proc univariate data=split;
by set;
var i;
histogram;
output out=out_&dsn. mean=mi std=stdi;
run;
%mend mySplit;

%mySplit(new,%str(10,20,30,24));
PG

View solution in original post


All Replies
Super User
Posts: 5,069

Re: split data into different sizes

Most likely, the best advice would be this:  Don't do it!  Instead of splitting up the data, just add a new variable to your existing data set.  The new variable could be "1" for the first 10 observations, "2" for the next 20, "3" for the next 30, etc.

 

You can always process with a BY statement later to get statistics for each group, or possibly with a WHERE statement to select just a single group.

 

You'll save a lot of headaches trying to come up with data set names and tracking which is which.

 

Good luck.

Solution
‎11-09-2015 08:22 AM
Respected Advisor
Posts: 4,640

Re: split data into different sizes

To follow @Astounding's suggestion, you should do something like this:

 

data new;
do i =1 to 84;
    output;
    end;
run;

%macro mySplit(dsn,sizes);
data split;
do s = &sizes.;
    set+1;
    do j = 1 to s;
        set &dsn;
        output;
        end;
    end;
drop s j;
run;

ods graphics / imagename="&dsn._graph";
proc univariate data=split;
by set;
var i;
histogram;
output out=out_&dsn. mean=mi std=stdi;
run;
%mend mySplit;

%mySplit(new,%str(10,20,30,24));
PG
Contributor
Posts: 33

Re: split data into different sizes

[ Edited ]

Thank you PG Stats! this is perfect! Smiley Happy

Super User
Posts: 5,254

Re: split data into different sizes

Kinda like @Astounding said, but don't do this.
A data set with 84 variables is unlikely normalized. If you normalize you will get a robust structure that seldom needs to be changed. Also you minimize the maintenance of having variable specific code and avoiding the need for macro coding.
Data never sleeps
Contributor
Posts: 33

Re: split data into different sizes

Thanks... my main goal is to check if the data for each segment is normally distributed.... and in reality, I may have different number of observations. If the number of observations is small, then I might have only one  or 2 segments.... or when the data is large, i may have many segments. Thanks!

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 411 views
  • 2 likes
  • 4 in conversation