BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
NonSleeper
Quartz | Level 8

Suppose an original data has a class variable

data temp;

input CLASS VAR;

datalines;

1 1

1 2

1 .

2 3

2 6

;

run;

How can I create sub-datasets based on the CLASS variable in a rapid way? The above data should produce two sub-datasets that look like:

- Sub-dataset1:

CLASS VAR

1 1

1 2

1 .

- Sub-dataset2:

CLASS VAR

2 3

2 6

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

Okay, I see solutions that use MACROs, that use CALL EXECUTE, that use hash(), but I don't see a simple data step solution, such as:

data subdataset1 subdataset2;

input CLASS VAR;

if class=1 then output subdataset1;

else if class=2 then output subdataset2;

datalines;

1 1

1 2

1 .

2 3

2 6

;

run;

What am I missing? This works perfectly. Why resort to complicated SAS structures for what is obviously a beginner problem, when the most basic SAS structure (a data set) works fine?

--
Paige Miller

View solution in original post

23 REPLIES 23
Haikuo
Onyx | Level 15

One of a dynamic options could be Hash(), if your data is sorted by class or at least clustered by class:

data temp;

input CLASS VAR;

datalines;

1 1

1 2

1 .

2 3

2 6

;

run;

data _null_;

  if _n_=1 then do;

        declare hash h();

h.definekey('_n_');

h.definedata('class','var');

      h.definedone();

  end;

  set temp;

    by class notsorted;

rc=h.replace();

      if last.class then do;

rc=h.output(dataset:cats('class_',class));

rc=h.clear();

       end;

run;


Haikuo

Jonate_H
Quartz | Level 8

I find this code very helpful. by the way, suppose the data set "temp"  has one more variable named year, how to make the final datasets ordered by year? eg., the final datasets class_1 and class_2 are ordered by year. thanks!

Scott_Mitchell
Quartz | Level 8

You could also use:

DATA _NULL_;

  SET HAVE END=EOF;

  BY CLASS;

  IF FIRST.CLASS;

  CALL EXECUTE(COMPBL("DATA CLASS_" || COMPRESS(CLASS) || ";

                     SET HAVE (WHERE = (CLASS =" || CLASS || "));

  RUN;"));

RUN;

jakarman
Barite | Level 11

@Scott, Did I miss something?  Wa there a change in Call execute...

http://www.sascommunity.org/wiki/Call_Execute_in_the_Data_Step

---->-- ja karman --<-----
Scott_Mitchell
Quartz | Level 8

@Jaap,

I am not sure what you are querying.  If you run the previously attached code you will see the following in the log:

14         GOPTIONS ACCESSIBLE;

15         DATA WANT;

16         SET HAVE END=EOF;

17         BY CLASS;

18         IF FIRST.CLASS;

19         CALL EXECUTE(COMPBL("DATA CLASS_" || COMPRESS(CLASS) || ";

20                              SET HAVE (WHERE = (CLASS =" || CLASS || "));

21         RUN;"));

22         RUN;

NOTE: Numeric values have been converted to character values at the places given by:

      (Line):(Column).

      19:48   20:55  

NOTE: There were 50 observations read from the data set WORK.HAVE.

NOTE: The data set WORK.WANT has 10 observations and 501 variables.

NOTE: DATA statement used (Total process time):

      real time           0.00 seconds

      cpu time            0.00 seconds

     

NOTE: CALL EXECUTE generated line.

1         + DATA CLASS_1;    SET HAVE (WHERE = (CLASS = 1));        RUN;

NOTE: There were 5 observations read from the data set WORK.HAVE.

      WHERE class="1";

NOTE: The data set WORK.CLASS_1 has 5 observations and 501 variables.

NOTE: DATA statement used (Total process time):

      real time           0.00 seconds

      cpu time            0.00 seconds

2         + DATA CLASS_2;    SET HAVE (WHERE = (CLASS = 2));        RUN;

NOTE: There were 5 observations read from the data set WORK.HAVE.

      WHERE class="2";

NOTE: The data set WORK.CLASS_2 has 5 observations and 501 variables.

This suggests that each datastep created via CALL EXECUTE is executed following the completion of the _NULL_ datastep, which is what this article is pointing out.

jakarman
Barite | Level 11

@ Scott, Ok I see your approach.   

It is generating possible a lot of new data-step-s each selecting one part and creating one datastep each.

I was thinking of a lot of SAS-datasets being created while reading, in one pass!,  the input (performance reasons).

This feature of splitting data in sas-data step is rather unique. 

That is requiring:

- a list of the to be created datasets (defnition part)

- a list of the output statements in a selection

---->-- ja karman --<-----
UrvishShah
Fluorite | Level 6

Here is my try by using macro...

%macro test;

   proc sql noprint;

      select distinct class into :list separated by "?"

      from have;

      select count(distinct class) into :tot

      from have;

   quit;

   %do i = 1 %to &tot.;

      data data&i.;

        set have;

        if class = %scan(&list.,&i.,"?");

      run;

   %end;

%mend;

%test

-Urvish

PaigeMiller
Diamond | Level 26

Okay, I see solutions that use MACROs, that use CALL EXECUTE, that use hash(), but I don't see a simple data step solution, such as:

data subdataset1 subdataset2;

input CLASS VAR;

if class=1 then output subdataset1;

else if class=2 then output subdataset2;

datalines;

1 1

1 2

1 .

2 3

2 6

;

run;

What am I missing? This works perfectly. Why resort to complicated SAS structures for what is obviously a beginner problem, when the most basic SAS structure (a data set) works fine?

--
Paige Miller
jakarman
Barite | Level 11

Paigemiller, You are missing nothing, perhaps some minor requirements.
- Are the subclasses known? If No than is needed some addtional logic.

- Is the question about quick and easy coding or about a well performing behavior on bigger datasets.

- ... (minor)

Your approach is the most easy to understand that will deliver reliable and the best performance.

Indeed why should it made more complicated as needed. KISS.     

---->-- ja karman --<-----
LinusH
Tourmaline | Level 20

It's not very dynamic, is it...?

A version of Urvish macro solution which should be slightly more efficient (fewer passes through the data):

%macro test;

proc sql noprint;

  select distinct catx('_','T',class),
    class
  into :table_list separated by ' ', :list separated by ' '
         from temp;

        select count(distinct class) into :tot
        from temp;
quit;

data &table_list.;
set temp;
  select(class);
   %do i = 1 %to &tot.;
   when(%scan(&list.,&i.)) output %scan(&table_list.,&i);
  %end;
   otherwise;
  end;
run;

%mend;

%test

Data never sleeps
PaigeMiller
Diamond | Level 26

It's not very dynamic, is it...?

At no point did the original question ask for a dynamic solution.

A version of Urvish macro solution which should be slightly more efficient (fewer passes through the data):

I'm not grasping why you say this. My solution requires one pass through the data. Your solution requires n passes through the data, if there are n different values of CLASS.

I agree that all of the above solutions work. Since it appears to me that a beginner has asked a relative simple question, I cannot see the benefit of a complex answer when a simple answer suffices. I think that's called Occam's toothbrush.

--
Paige Miller
LinusH
Tourmaline | Level 20

We often urge to have questions simplified so that they could be easily understood. The real life problem often has much more complex data. Therefore I suggested a more generic solution.

I did mean that my macro was slightly more efficient than Urvish (maybe I should put that in a separate post, sorry if I seemed rude). My code has 3 passes of the data, since Urvish had 2 + no distinct values of CLASS.

Razor :smileyconfused:

Data never sleeps
Haikuo
Onyx | Level 15

,

In deed your approach is in general more efficient comparing to , one caveat though,   'table_list' has be shorter then 65k. Smiley Wink

Haikuo

LinusH
Tourmaline | Level 20

Aggreed, its not without limitation.  Let me relabel it Fairly generic. .. Smiley Wink

Data never sleeps

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 23 replies
  • 1710 views
  • 3 likes
  • 9 in conversation