How to create sub-datasets based on a class variable?

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 75
Accepted Solution

How to create sub-datasets based on a class variable?

Suppose an original data has a class variable

data temp;

input CLASS VAR;

datalines;

1 1

1 2

1 .

2 3

2 6

;

run;

How can I create sub-datasets based on the CLASS variable in a rapid way? The above data should produce two sub-datasets that look like:

- Sub-dataset1:

CLASS VAR

1 1

1 2

1 .

- Sub-dataset2:

CLASS VAR

2 3

2 6


Accepted Solutions
Solution
‎08-12-2013 09:08 AM
Trusted Advisor
Posts: 1,614

Re: How to create sub-datasets based on a class variable?

Okay, I see solutions that use MACROs, that use CALL EXECUTE, that use hash(), but I don't see a simple data step solution, such as:

data subdataset1 subdataset2;

input CLASS VAR;

if class=1 then output subdataset1;

else if class=2 then output subdataset2;

datalines;

1 1

1 2

1 .

2 3

2 6

;

run;

What am I missing? This works perfectly. Why resort to complicated SAS structures for what is obviously a beginner problem, when the most basic SAS structure (a data set) works fine?

View solution in original post


All Replies
Respected Advisor
Posts: 3,124

Re: How to create sub-datasets based on a class variable?

One of a dynamic options could be Hash(), if your data is sorted by class or at least clustered by class:

data temp;

input CLASS VAR;

datalines;

1 1

1 2

1 .

2 3

2 6

;

run;

data _null_;

  if _n_=1 then do;

        declare hash h();

h.definekey('_n_');

h.definedata('class','var');

      h.definedone();

  end;

  set temp;

    by class notsorted;

rc=h.replace();

      if last.class then do;

rc=h.output(dataset:cats('class_',class));

rc=h.clear();

       end;

run;


Haikuo

Frequent Contributor
Posts: 130

Re: How to create sub-datasets based on a class variable?

I find this code very helpful. by the way, suppose the data set "temp"  has one more variable named year, how to make the final datasets ordered by year? eg., the final datasets class_1 and class_2 are ordered by year. thanks!

Super Contributor
Posts: 297

Re: How to create sub-datasets based on a class variable?

You could also use:

DATA _NULL_;

  SET HAVE END=EOF;

  BY CLASS;

  IF FIRST.CLASS;

  CALL EXECUTE(COMPBL("DATA CLASS_" || COMPRESS(CLASS) || ";

                     SET HAVE (WHERE = (CLASS =" || CLASS || "));

  RUN;"));

RUN;

Valued Guide
Posts: 3,208

Re: How to create sub-datasets based on a class variable?

@Scott, Did I miss something?  Wa there a change in Call execute...

http://www.sascommunity.org/wiki/Call_Execute_in_the_Data_Step

---->-- ja karman --<-----
Super Contributor
Posts: 297

Re: How to create sub-datasets based on a class variable?

@Jaap,

I am not sure what you are querying.  If you run the previously attached code you will see the following in the log:

14         GOPTIONS ACCESSIBLE;

15         DATA WANT;

16         SET HAVE END=EOF;

17         BY CLASS;

18         IF FIRST.CLASS;

19         CALL EXECUTE(COMPBL("DATA CLASS_" || COMPRESS(CLASS) || ";

20                              SET HAVE (WHERE = (CLASS =" || CLASS || "));

21         RUN;"));

22         RUN;

NOTE: Numeric values have been converted to character values at the places given by:

      (Line)Smiley SadColumn).

      19:48   20:55  

NOTE: There were 50 observations read from the data set WORK.HAVE.

NOTE: The data set WORK.WANT has 10 observations and 501 variables.

NOTE: DATA statement used (Total process time):

      real time           0.00 seconds

      cpu time            0.00 seconds

     

NOTE: CALL EXECUTE generated line.

1         + DATA CLASS_1;    SET HAVE (WHERE = (CLASS = 1));        RUN;

NOTE: There were 5 observations read from the data set WORK.HAVE.

      WHERE class="1";

NOTE: The data set WORK.CLASS_1 has 5 observations and 501 variables.

NOTE: DATA statement used (Total process time):

      real time           0.00 seconds

      cpu time            0.00 seconds

2         + DATA CLASS_2;    SET HAVE (WHERE = (CLASS = 2));        RUN;

NOTE: There were 5 observations read from the data set WORK.HAVE.

      WHERE class="2";

NOTE: The data set WORK.CLASS_2 has 5 observations and 501 variables.

This suggests that each datastep created via CALL EXECUTE is executed following the completion of the _NULL_ datastep, which is what this article is pointing out.

Valued Guide
Posts: 3,208

Re: How to create sub-datasets based on a class variable?

@ Scott, Ok I see your approach.   

It is generating possible a lot of new data-step-s each selecting one part and creating one datastep each.

I was thinking of a lot of SAS-datasets being created while reading, in one pass!,  the input (performance reasons).

This feature of splitting data in sas-data step is rather unique. 

That is requiring:

- a list of the to be created datasets (defnition part)

- a list of the output statements in a selection

---->-- ja karman --<-----
Regular Contributor
Posts: 195

Re: How to create sub-datasets based on a class variable?

Here is my try by using macro...

%macro test;

   proc sql noprint;

      select distinct class into :list separated by "?"

      from have;

      select count(distinct class) into :tot

      from have;

   quit;

   %do i = 1 %to &tot.;

      data data&i.;

        set have;

        if class = %scan(&list.,&i.,"?");

      run;

   %end;

%mend;

%test

-Urvish

Solution
‎08-12-2013 09:08 AM
Trusted Advisor
Posts: 1,614

Re: How to create sub-datasets based on a class variable?

Okay, I see solutions that use MACROs, that use CALL EXECUTE, that use hash(), but I don't see a simple data step solution, such as:

data subdataset1 subdataset2;

input CLASS VAR;

if class=1 then output subdataset1;

else if class=2 then output subdataset2;

datalines;

1 1

1 2

1 .

2 3

2 6

;

run;

What am I missing? This works perfectly. Why resort to complicated SAS structures for what is obviously a beginner problem, when the most basic SAS structure (a data set) works fine?

Valued Guide
Posts: 3,208

Re: How to create sub-datasets based on a class variable?

Paigemiller, You are missing nothing, perhaps some minor requirements.
- Are the subclasses known? If No than is needed some addtional logic.

- Is the question about quick and easy coding or about a well performing behavior on bigger datasets.

- ... (minor)

Your approach is the most easy to understand that will deliver reliable and the best performance.

Indeed why should it made more complicated as needed. KISS.     

---->-- ja karman --<-----
Super User
Posts: 5,256

Re: How to create sub-datasets based on a class variable?

It's not very dynamic, is it...?

A version of Urvish macro solution which should be slightly more efficient (fewer passes through the data):

%macro test;

proc sql noprint;

  select distinct catx('_','T',class),
    class
  into :table_list separated by ' ', :list separated by ' '
         from temp;

        select count(distinct class) into :tot
        from temp;
quit;

data &table_list.;
set temp;
  select(class);
   %do i = 1 %to &tot.;
   when(%scan(&list.,&i.)) output %scan(&table_list.,&i);
  %end;
   otherwise;
  end;
run;

%mend;

%test

Data never sleeps
Trusted Advisor
Posts: 1,614

Re: How to create sub-datasets based on a class variable?

It's not very dynamic, is it...?

At no point did the original question ask for a dynamic solution.

A version of Urvish macro solution which should be slightly more efficient (fewer passes through the data):

I'm not grasping why you say this. My solution requires one pass through the data. Your solution requires n passes through the data, if there are n different values of CLASS.

I agree that all of the above solutions work. Since it appears to me that a beginner has asked a relative simple question, I cannot see the benefit of a complex answer when a simple answer suffices. I think that's called Occam's toothbrush.

Super User
Posts: 5,256

Re: How to create sub-datasets based on a class variable?

We often urge to have questions simplified so that they could be easily understood. The real life problem often has much more complex data. Therefore I suggested a more generic solution.

I did mean that my macro was slightly more efficient than Urvish (maybe I should put that in a separate post, sorry if I seemed rude). My code has 3 passes of the data, since Urvish had 2 + no distinct values of CLASS.

Razor :smileyconfused:

Data never sleeps
Respected Advisor
Posts: 3,124

Re: How to create sub-datasets based on a class variable?

,

In deed your approach is in general more efficient comparing to , one caveat though,   'table_list' has be shorter then 65k. Smiley Wink

Haikuo

Super User
Posts: 5,256

Re: How to create sub-datasets based on a class variable?

Aggreed, its not without limitation.  Let me relabel it Fairly generic. .. Smiley Wink

Data never sleeps
☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 23 replies
  • 565 views
  • 3 likes
  • 9 in conversation