BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
thockada
Calcite | Level 5

If I have a dataset with a variable called "Color"

 

From this dataset, I would like to create separate datasets based on color

(but I might not always know how many different colors are in the dataset) - meaning, I don't want to have to hardcode a list of all possible colors rather I want to sort the file by color, then as the color changes, I want to create a new dataset and name it the new the new /next value of the variable.

 

So if I had a dataset with records containing the "color"  Red. Blue. Green.  - I would want to output 3 datasets

 

DSN1 = Color_Blue

DSN2 = Color_Green

DSN3 = Color_Red

 

If there were more colors, I want to output a dataset for each based on their "Color"

DSN.n - Color_<var-value>

 

Thanks

 

 

1 ACCEPTED SOLUTION
7 REPLIES 7
collinelliot
Barite | Level 11

You could write a macro where you cycle through a unique list of the colors in the data set (untested).

 

proc sql;
create table color_list as
select distinct color
from your_data;
quit;

%macro output_color_sets;

proc sql;
select color into :color1-:color99 from color_list;
quit;

%do i = 1 %to &sqlObs;


data color_&&color&i;
set your_data;
where color = "&&color&i";
run;

%end;


%mend output_color_sets;
%output_color_sets

 

 

 

This might do what you want without a macro:

 

http://analytics.ncsu.edu/sesug/2007/SD04.pdf

thockada
Calcite | Level 5

Awesome, thanks for the quick response - I will try these out.

 

One twist - what if I wanted to create a single excel  spreadsheet of the same with each color as a different tab/sheet?

thockada
Calcite | Level 5

Awesome, exactly what I needed.  Thank you

Patrick
Opal | Level 21

@thockada

Or as a variant code which can output tables to any destination:

data have;
  length color $20;
  do color='Blue', 'Red', 'Green';
    do i=1 to 10;
      output;
    end;
  end;
  stop;
run;

proc sort data=have;
  by color;
run;


libname mywb xlsx 'c:\temp\demo_wb.xlsx';
data _null_;
  if _n_=1 then 
    do;
      dcl hash h1(dataset:'have(obs=0)', multidata:'y');
      h1.defineKey('color');
      h1.defineData(all:'y');
      h1.defineDone();
    end;

  do until(last.color);
    set have;
    by color;
    h1.add();
  end;

  h1.output(dataset:cats('mywb.want_',color));
  h1.clear();
run;
libname mywb clear;
mkeintz
PROC Star

All the solutions you've been provided so far require reading the dataset twice.  The solution below needs to read it only once, followed by renaming dummy dataset names to names based on the values encountered. The example use the sashelp.stocks data set, and it doesn't need a sorted data set, nor does it need to have enough memory to hold data temporarily in hash objects:

 

 

data d1 d2 d3 d4 d5 d6 d7 d8;
  length st_list $500  proc_ds $3000;
  retain st_list ' ' 
         proc_ds 'proc datasets library=work nolist; change ';

  set sashelp.stocks  end=eos;

  d=findw(st_list,trim(stock),' ','E');
  if d=0 then do;
    d=countw(st_list)+1;
    st_list=catx(' ',st_list,stock);
    proc_ds=catx(' ',proc_ds,cats('d',d,'=',stock));
  end;

  select (d);
    when (1) output d1;
    when (2) output d2;
    when (3) output d3;
    when (4) output d4;
    when (5) output d5;
    when (6) output d6;
    when (7) output d7;
    when (8) output d8;
  end;

  if eos then do;
    proc_ds=cats(proc_ds,'; delete d0');
    if countw(st_list)<8 then do d=countw(st_list)+1 to 8;
      proc_ds=catx(' ',proc_ds,cats('d',d));
    end;
    proc_ds=catx(' ',proc_ds,';quit;');
    call execute(trim(proc_ds));
  end;
  drop d st_list proc_ds;
run;

 

  1. This program works by creating datasets D1 through D8 (although D4-D8 have 0 observations in this example).
  2. In a subsequent proc datasets (see the call execute(trim(proc_ds)) statement), D1 through D3 are renamed to ibm, intel, and Microsoft.   Data sets d4-d8 are deleted.
  3. Important:  Make sure the DATA statement declares enough data sets to accommodate the largest expected number of unique values (of STOCK) in the data set.

    But you can protect yourself against this by having a "miscellaneous" dataset to capture all  observations that are not output to D1-D8.  That (relatively small) miscellaneous data set can then be submitted to the same processing as the original 

 

The benefits of this approach are

  1. you read the dataset only once
  2. you don't need to know in advance what values of stock will be found (as per your original request)
  3. the dataset does not need to be sorted by stock
  4. you don't need to maintain a hash object, which means
    1. memory is conserved
    2. output observations are automatically kept in the same order as incoming observations.

 

 

bb

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Kurt_Bremser
Super User

Still another method, using call execute to create a dynamic data step:

proc sort data=have (keep=color) out=lookup nodupkey;
by color;
run;

data _null_;
call execute('data');
do until (eof1);
  set lookup end=eof1;
  call execute(' ' !! trim(color));
end;
call execute('; set have; select (color); ');
do until (eof2);
  set lookup end=eof2;
  call execute('when ("' !! trim(color) !! '") output ' !! trim(color) !! '; ');
end;
call execute('run;');
run;

This will require two passes through your dataset, but be completely flexible; the only limit is the number of distinct color values, which could break the data statement created by the call execute if it exceeds 32767 characters. Or if it exceeds the system limit for concurrently open filehandles.

A requirement is of course that the values in color can be used as valid SAS names.

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1157 views
  • 0 likes
  • 6 in conversation