BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
brianfpegg
Calcite | Level 5

 

I have a very large time series data set that won't process all at once. I'm trying to split it into 4 groups and process separately. However, I need each CUSTOMER to only be in one of the groups. CUSTOMER is a character variable. I was trying to do this with the PROC SURVEYSELECT code below but it is giving an error--seems like I can't use GROUPS and CLUSTER together. I also tried PROC RANKS but I can't use it to group the character field CUSTOMER.

 

 

PROC SURVEYSELECT DATA=TIME_SERIES_INPUT OUT=TIME_SERIES_OUTPUT GROUPS=4 SEED=20180908 outall;
CLUSTER CUSTOMER;
RUN;

 

This code gives the following error:

ERROR: A SAMPLINGUNIT statement may not be specified with the GROUPS= option.

 

Thanks!

 

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

I can't say that I understand your entire problem, but it ought to be easy to split data sets into groups such that each customer is in only one group. (And it's not clear to me where the clustering comes in). Here's one way to do this:

 

UNTESTED CODE

proc sql;
    create table customer_data_set as select distinct customer from have;
quit;

data customer_data_set;
    set customer_data_set;
    group=mod(_n_,4);
run;

proc sort data=have;
    by customer;
run;

data want1 want2 want3 want4;
    merge have customer_data_set;
    by customer;
    if group=0 then output want1;
    else if group=1 then output want2;
    else if group=2 then output want3;
    else if group=3 then output want4;
run;

 

--
Paige Miller

View solution in original post

4 REPLIES 4
PaigeMiller
Diamond | Level 26

I can't say that I understand your entire problem, but it ought to be easy to split data sets into groups such that each customer is in only one group. (And it's not clear to me where the clustering comes in). Here's one way to do this:

 

UNTESTED CODE

proc sql;
    create table customer_data_set as select distinct customer from have;
quit;

data customer_data_set;
    set customer_data_set;
    group=mod(_n_,4);
run;

proc sort data=have;
    by customer;
run;

data want1 want2 want3 want4;
    merge have customer_data_set;
    by customer;
    if group=0 then output want1;
    else if group=1 then output want2;
    else if group=2 then output want3;
    else if group=3 then output want4;
run;

 

--
Paige Miller
brianfpegg
Calcite | Level 5

The problem is there are several records for each customer. The primary key for the table is CUSTOMER and TIME. When you select distinct, you aren't merging back with the total data set. I could add a merge statement to do this, but I was looking for a more efficient solution given the size of this data.

 

Thanks for your help!

PaigeMiller
Diamond | Level 26

All records for each customer are kept together in this method.

--
Paige Miller
brianfpegg
Calcite | Level 5

Ah, I didn't see the merge in the last step. Thanks!

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1308 views
  • 2 likes
  • 2 in conversation