DATA Step, Macro, Functions and more

Splitting the dataset using macros

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 5
Accepted Solution

Splitting the dataset using macros

obs     store   product                            date                                                   buysell  units                  price

1110000119/02/2013b1000$1,000.29
2110000120/03/2013s1000$2,500.20
3110000217/03/2014b1000$2,000.20
4110000318/05/2014b2000$3,000.20
5210000414/05/2013b1000$3,000.29
6210000420/06/2013s1000$3,500.20
7210000517/03/2014b1000$2,000.20
8210000618/05/2014b2000$3,000.20
9210000720/09/2013s1000$3,500.20

how can I split this data set into different number of datasets according to the store and name that datasets as work_i where i is the number of store?

This is my coding

%macro prod(dsn);

proc sql;

select distinct max(store) into:x from &dsn;

quit;

%do i=1 %to &x;

data mypro.work&i;

set &dsn;

%where store='i';

run;

%end;

%mend prod;


Accepted Solutions
Solution
‎06-23-2015 04:57 PM
SAS Employee
Posts: 340

Re: Splitting the dataset using macros

Posted in reply to johnsaini

Hi,

Your original code is almost good (if store numbers are 1.. N consecutive integers).

Change this:

%where store='i';

to:

where store="&i"; /*if store is a string*/

or:

where store=&i;  /*if store is a number*/


You don't need the distinct keyword in the select statement (but it does not hurt in this case).


And I agree with previous posts: don't do it, unless you have a very good reason for it.

Why do you want to do this? Maybe we can offer a better solution then splitting the data set.

View solution in original post


All Replies
PROC Star
Posts: 1,324

Re: Splitting the dataset using macros

Posted in reply to johnsaini

Seems like a learning exercise.  As a first step, I would suggest try writing two DATA steps that you would use to make mypro.Work1 and mypro.Work2.   Then once you have those DATA steps, try updating the macro to generate those two steps.  Remember, the macro language is mostly just a SAS code generator, so when learning macro language, it sometimes helps to type the SAS code yourself, and get it working, before using the macro language to generate code.

Occasional Contributor
Posts: 5

Re: Splitting the dataset using macros

in this case we know that the maximum value of stores are 2, but how about if the value of store is a large number, then?

Regular Contributor
Posts: 227

Re: Splitting the dataset using macros

Posted in reply to johnsaini

1. Don't do this. use by-processing instead

2. YeahBut for the learning --- making mistakes --- experience

read up on the history of the problem.

Category:Making subsets - sasCommunity

Frequent Contributor
Posts: 130

Re: Splitting the dataset using macros

Posted in reply to johnsaini

I would do some research into the call symputx macro function combined with the cats function on the auto SAS variable _N_.  This will loop through a data set with the values equal to a specific variable (in this case store is what you'd want).  I provided some code below, assuming that the store variable is numeric.  If it's character then there needs to be double quotations around the &&store&i=>"&&store&i".  I would recommend researching and trying to get an understanding of what the code is actually doing so you can manipulate and modify it moving forward as you continue programming.

Hope this helps!

data test;

input store product date buysell $ units price;

informat date ddmmyy10. price dollar20.2;

format date ddmmyy10. price dollar20.2;

datalines;

1 100001 19/02/2013 b 1000 $1,000.29

1 100001 20/03/2013 s 1000 $2,500.20

1 100002 17/03/2014 b 1000 $2,000.20

1 100003 18/05/2014 b 2000 $3,000.20

2 100004 14/05/2013 b 1000 $3,000.29

2 100004 20/06/2013 s 1000 $3,500.20

2 100005 17/03/2014 b 1000 $2,000.20

2 100006 18/05/2014 b 2000 $3,000.20

2 100007 20/09/2013 s 1000 $3,500.20

;

run;

/***** REMOVE DUPLICATE RECORDS OF STORE *****/

proc sort data=test nodupkey

out=test_unique_store;

by store;

run;

%macro prod(dsn);

data _null_;

set test_unique_store end=last; /***** USE UNIQUE STORE DATA SET *****/

call symputx(cats("store",_n_),store);

If last then call symputx("n",_n_);

run;

%do i=1 %to &n; /***** &N IS THE VALUE OF THE NUMBER OF UNIQUE STORES (IN THIS EXAMPLE 1 AND 2) *****/

data mypro.work&i;

set &dsn;

where store=&&store&i;

run; %end;

%mend;

%prod(test)

Solution
‎06-23-2015 04:57 PM
SAS Employee
Posts: 340

Re: Splitting the dataset using macros

Posted in reply to johnsaini

Hi,

Your original code is almost good (if store numbers are 1.. N consecutive integers).

Change this:

%where store='i';

to:

where store="&i"; /*if store is a string*/

or:

where store=&i;  /*if store is a number*/


You don't need the distinct keyword in the select statement (but it does not hurt in this case).


And I agree with previous posts: don't do it, unless you have a very good reason for it.

Why do you want to do this? Maybe we can offer a better solution then splitting the data set.

Occasional Contributor
Posts: 5

Re: Splitting the dataset using macros

Posted in reply to gergely_batho

Thank you very much its working now

Regular Contributor
Posts: 227

Re: Splitting the dataset using macros

Posted in reply to johnsaini

ok, so perhaps store-id is not a single-digit integer in sequential order with none skipped.

http://www.sascommunity.org/wiki/List_Processing_Basics_Creating_and_Using_Lists_of_Macro_Variables

1. move the processing statements into a separate macro so you can test it

2. you're learning sql so make the list with sql select distinct store into :list_stores separated by ' ';

3. the loop reads an item from the list

and calls the processing macro

%macro do_this(store=);

data mypro.work&store;

set &dsn(where store=&store);

run;

%mend;

%macro prod(dsn);

proc sql;

select distinct store into :list_stores from &dsn;

quit;

%do i=1 %to &sqlobs;

%let item = %scan(&list_stores,&i);

%put echo &=item;

%put do_this(store=&item);

%end;

%mend prod;

Contributor
Posts: 20

Re: Splitting the dataset using macros

Posted in reply to Ron_Fehd_macro_maven

You can put *all* the store IDs in a macro variable, then %SCAN that list, pulling the ID's off one-at-a-time:

(This assumes the IDs are all integers)

data one;
attrib  date informat=ddmmyy10. format=date9.
  buysell informat=$1.
  price informat=dollar9.2 format=dollar9.2
;
input  store product date   buysell   units price   ;
cards;
1 100001 19/02/2013 b 1000 $1,000.29
1 100001 20/03/2013 s 1000 $2,500.20
1 100002 17/03/2014 b 1000 $2,000.20
1 100003 18/05/2014 b 2000 $3,000.20
2 100004 14/05/2013 b 1000 $3,000.29
2 100004 20/06/2013 s 1000 $3,500.20
2 100005 17/03/2014 b 1000 $2,000.20
2 100006 18/05/2014 b 2000 $3,000.20
2 100007 20/09/2013 s 1000 $3,500.20
9 123456 01/01/2001 b 9999 $1,234.56
39 33333 03/03/2003 x 3333 $3,333.33
run;
proc print; run;

options mprint mlogic symbolgen ;
%macro prod2(dsn);
proc sql;
select distinct store into :xall separated by ' ' from &dsn;
quit;
%put   &xall= ;

%let I = 1;
%do %while (%scan(&xall,&i) ne );
%let ID = %scan(&xall,&i); %put &ID= ;
data work.work_&id;
set &dsn(where=(store=%eval(&id)));
run;
%let I = %eval(&I + 1);
%end;

%mend prod2;

%prod2(work.one)

Super User
Super User
Posts: 7,988

Re: Splitting the dataset using macros

Posted in reply to Tommywhosc

Yes, I see this approach a lot.  I personally don't like the macro language involved with macro variable lists, and there are limitations to the length.  You can modify your code slightly to get the same effect without the limitations and macro code however:

proc sql;

     create table LOOP as select distinct STORE from DATASET;

quit;

data _null_;

     set loop;

     call execute('data want'||strip(store)||'; set have (where=(store="'||strip(store)||'")); run;');

run;

PROC Star
Posts: 1,324

Re: Splitting the dataset using macros

Just to add another non-macro alternative (not that I think the macro language is a bad choice for this problem), I'm amazed by the magic of what can be done with the hash object.  Below hash-splitting is based on example from a Paul Dorfman paper, http://www.lexjansen.com/nesug/nesug09/hw/HW04.pdf

219  *data _null_ step that outputs multiple datatsets ;
220  data _null_  ;
221    if _n_=1 then do ;
222      declare hash h() ;
223      h.definekey ("_n_") ;
224      h.definedata ("store","product","date","buysell","units","price") ;
225      h.definedone () ;
226    end;
227    do _n_ = 1 by 1 until (last.store) ;
228      set have ;
229      by store ;
230      h.add() ;
231    end ;
232    h.output (dataset: cats("Store_",store) ) ;
233    h.clear() ;
234  run ;

NOTE: The data set WORK.STORE_1 has 4 observations and 6 variables.
NOTE: The data set WORK.STORE_2 has 5 observations and 6 variables.
NOTE: There were 9 observations read from the data set WORK.HAVE.

Contributor
Posts: 65

Re: Splitting the dataset using macros

Posted in reply to johnsaini

Hi John,

You can try the below code also.

Let us assume the input dataset as "storeData"

%Maco splittingDatasets(dsn=);

proc sql;

select count(distinct store) into :storeCnt from &dsn;

%let storeCount = %cmpres(&storeCnt);

select distinct store into :store1 - :store&storeCount from &dsn;

quit;

%do i = 1 %to &storeCount;

data store&i;

set &dsn;

where store = &&store&i;

run;

%end;

%mend;

%splittingDatasets(dsn = storeData )

Super User
Posts: 5,516

Re: Splitting the dataset using macros

Posted in reply to johnsaini

One more comment ...

You should realize that even if you succeed at this task, you are accomplishing it in an inefficient way.  This code takes longer to run:

data store1;

   set &dsn;

   where store='1';

run;

data store2;

   set &dsn;

   where store='2';

run;

This code would run faster:

data store1 store2;

   set &dsn;

   if store='1' then output store1;

   else if store='2' then output store2;

run;

Macro language could generate the faster-running program, as well as the slower-running program.  It's perhaps a bit more complex to generate the faster-running program, so if this is really just a learning exercise it may be too much too soon.  And if your data sets are small, speed would be a secondary consideration anyway.

Good luck.

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 12 replies
  • 496 views
  • 14 likes
  • 9 in conversation