Create output datasets on-the-fly by a variable (Where I may not know the different values)

Accepted Solution Solved
Reply
New Contributor
Posts: 3
Accepted Solution

Create output datasets on-the-fly by a variable (Where I may not know the different values)

If I have a dataset with a variable called "Color"

 

From this dataset, I would like to create separate datasets based on color

(but I might not always know how many different colors are in the dataset) - meaning, I don't want to have to hardcode a list of all possible colors rather I want to sort the file by color, then as the color changes, I want to create a new dataset and name it the new the new /next value of the variable.

 

So if I had a dataset with records containing the "color"  Red. Blue. Green.  - I would want to output 3 datasets

 

DSN1 = Color_Blue

DSN2 = Color_Green

DSN3 = Color_Red

 

If there were more colors, I want to output a dataset for each based on their "Color"

DSN.n - Color_<var-value>

 

Thanks

 

 


Accepted Solutions
Solution
‎10-10-2017 01:09 PM
Super User
Posts: 21,546

Re: Create output datasets on-the-fly by a variable (Where I may not know the different values)


All Replies
PROC Star
Posts: 311

Re: Create output datasets on-the-fly by a variable (Where I may not know the different values)

You could write a macro where you cycle through a unique list of the colors in the data set (untested).

 

proc sql;
create table color_list as
select distinct color
from your_data;
quit;

%macro output_color_sets;

proc sql;
select color into :color1-:color99 from color_list;
quit;

%do i = 1 %to &sqlObs;


data color_&&color&i;
set your_data;
where color = "&&color&i";
run;

%end;


%mend output_color_sets;
%output_color_sets

 

 

 

This might do what you want without a macro:

 

http://analytics.ncsu.edu/sesug/2007/SD04.pdf

New Contributor
Posts: 3

Re: Create output datasets on-the-fly by a variable (Where I may not know the different values)

Posted in reply to collinelliot

Awesome, thanks for the quick response - I will try these out.

 

One twist - what if I wanted to create a single excel  spreadsheet of the same with each color as a different tab/sheet?

Solution
‎10-10-2017 01:09 PM
Super User
Posts: 21,546

Re: Create output datasets on-the-fly by a variable (Where I may not know the different values)

BY group processing within ODS EXCELXP or ODS EXCEL. 

https://support.sas.com/rnd/base/ods/odsmarkup/excelxp_demo.html#data

New Contributor
Posts: 3

Re: Create output datasets on-the-fly by a variable (Where I may not know the different values)

Awesome, exactly what I needed.  Thank you

Respected Advisor
Posts: 4,284

Re: Create output datasets on-the-fly by a variable (Where I may not know the different values)

@thockada

Or as a variant code which can output tables to any destination:

data have;
  length color $20;
  do color='Blue', 'Red', 'Green';
    do i=1 to 10;
      output;
    end;
  end;
  stop;
run;

proc sort data=have;
  by color;
run;


libname mywb xlsx 'c:\temp\demo_wb.xlsx';
data _null_;
  if _n_=1 then 
    do;
      dcl hash h1(dataset:'have(obs=0)', multidata:'y');
      h1.defineKey('color');
      h1.defineData(all:'y');
      h1.defineDone();
    end;

  do until(last.color);
    set have;
    by color;
    h1.add();
  end;

  h1.output(dataset:cats('mywb.want_',color));
  h1.clear();
run;
libname mywb clear;
Trusted Advisor
Posts: 1,149

Re: Create output datasets on-the-fly by a variable (Where I may not know the different values)

All the solutions you've been provided so far require reading the dataset twice.  The solution below needs to read it only once, followed by renaming dummy dataset names to names based on the values encountered. The example use the sashelp.stocks data set, and it doesn't need a sorted data set, nor does it need to have enough memory to hold data temporarily in hash objects:

 

 

data d1 d2 d3 d4 d5 d6 d7 d8;
  length st_list $500  proc_ds $3000;
  retain st_list ' ' 
         proc_ds 'proc datasets library=work nolist; change ';

  set sashelp.stocks  end=eos;

  d=findw(st_list,trim(stock),' ','E');
  if d=0 then do;
    d=countw(st_list)+1;
    st_list=catx(' ',st_list,stock);
    proc_ds=catx(' ',proc_ds,cats('d',d,'=',stock));
  end;

  select (d);
    when (1) output d1;
    when (2) output d2;
    when (3) output d3;
    when (4) output d4;
    when (5) output d5;
    when (6) output d6;
    when (7) output d7;
    when (8) output d8;
  end;

  if eos then do;
    proc_ds=cats(proc_ds,'; delete d0');
    if countw(st_list)<8 then do d=countw(st_list)+1 to 8;
      proc_ds=catx(' ',proc_ds,cats('d',d));
    end;
    proc_ds=catx(' ',proc_ds,';quit;');
    call execute(trim(proc_ds));
  end;
  drop d st_list proc_ds;
run;

 

  1. This program works by creating datasets D1 through D8 (although D4-D8 have 0 observations in this example).
  2. In a subsequent proc datasets (see the call execute(trim(proc_ds)) statement), D1 through D3 are renamed to ibm, intel, and Microsoft.   Data sets d4-d8 are deleted.
  3. Important:  Make sure the DATA statement declares enough data sets to accommodate the largest expected number of unique values (of STOCK) in the data set.

    But you can protect yourself against this by having a "miscellaneous" dataset to capture all  observations that are not output to D1-D8.  That (relatively small) miscellaneous data set can then be submitted to the same processing as the original 

 

The benefits of this approach are

  1. you read the dataset only once
  2. you don't need to know in advance what values of stock will be found (as per your original request)
  3. the dataset does not need to be sorted by stock
  4. you don't need to maintain a hash object, which means
    1. memory is conserved
    2. output observations are automatically kept in the same order as incoming observations.

 

 

bb

Super User
Posts: 8,590

Re: Create output datasets on-the-fly by a variable (Where I may not know the different values)

[ Edited ]

Still another method, using call execute to create a dynamic data step:

proc sort data=have (keep=color) out=lookup nodupkey;
by color;
run;

data _null_;
call execute('data');
do until (eof1);
  set lookup end=eof1;
  call execute(' ' !! trim(color));
end;
call execute('; set have; select (color); ');
do until (eof2);
  set lookup end=eof2;
  call execute('when ("' !! trim(color) !! '") output ' !! trim(color) !! '; ');
end;
call execute('run;');
run;

This will require two passes through your dataset, but be completely flexible; the only limit is the number of distinct color values, which could break the data statement created by the call execute if it exceeds 32767 characters. Or if it exceeds the system limit for concurrently open filehandles.

A requirement is of course that the values in color can be used as valid SAS names.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 158 views
  • 0 likes
  • 6 in conversation