BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
johnsaini
Calcite | Level 5

obs     store   product                            date                                                   buysell  units                  price

1110000119/02/2013b1000$1,000.29
2110000120/03/2013s1000$2,500.20
3110000217/03/2014b1000$2,000.20
4110000318/05/2014b2000$3,000.20
5210000414/05/2013b1000$3,000.29
6210000420/06/2013s1000$3,500.20
7210000517/03/2014b1000$2,000.20
8210000618/05/2014b2000$3,000.20
9210000720/09/2013s1000$3,500.20

how can I split this data set into different number of datasets according to the store and name that datasets as work_i where i is the number of store?

This is my coding

%macro prod(dsn);

proc sql;

select distinct max(store) into:x from &dsn;

quit;

%do i=1 %to &x;

data mypro.work&i;

set &dsn;

%where store='i';

run;

%end;

%mend prod;

1 ACCEPTED SOLUTION

Accepted Solutions
gergely_batho
SAS Employee

Hi,

Your original code is almost good (if store numbers are 1.. N consecutive integers).

Change this:

%where store='i';

to:

where store="&i"; /*if store is a string*/

or:

where store=&i;  /*if store is a number*/


You don't need the distinct keyword in the select statement (but it does not hurt in this case).


And I agree with previous posts: don't do it, unless you have a very good reason for it.

Why do you want to do this? Maybe we can offer a better solution then splitting the data set.

View solution in original post

12 REPLIES 12
Quentin
Super User

Seems like a learning exercise.  As a first step, I would suggest try writing two DATA steps that you would use to make mypro.Work1 and mypro.Work2.   Then once you have those DATA steps, try updating the macro to generate those two steps.  Remember, the macro language is mostly just a SAS code generator, so when learning macro language, it sometimes helps to type the SAS code yourself, and get it working, before using the macro language to generate code.

johnsaini
Calcite | Level 5

in this case we know that the maximum value of stores are 2, but how about if the value of store is a large number, then?

Ron_MacroMaven
Lapis Lazuli | Level 10

1. Don't do this. use by-processing instead

2. YeahBut for the learning --- making mistakes --- experience

read up on the history of the problem.

Category:Making subsets - sasCommunity

dcruik
Lapis Lazuli | Level 10

I would do some research into the call symputx macro function combined with the cats function on the auto SAS variable _N_.  This will loop through a data set with the values equal to a specific variable (in this case store is what you'd want).  I provided some code below, assuming that the store variable is numeric.  If it's character then there needs to be double quotations around the &&store&i=>"&&store&i".  I would recommend researching and trying to get an understanding of what the code is actually doing so you can manipulate and modify it moving forward as you continue programming.

Hope this helps!

data test;

input store product date buysell $ units price;

informat date ddmmyy10. price dollar20.2;

format date ddmmyy10. price dollar20.2;

datalines;

1 100001 19/02/2013 b 1000 $1,000.29

1 100001 20/03/2013 s 1000 $2,500.20

1 100002 17/03/2014 b 1000 $2,000.20

1 100003 18/05/2014 b 2000 $3,000.20

2 100004 14/05/2013 b 1000 $3,000.29

2 100004 20/06/2013 s 1000 $3,500.20

2 100005 17/03/2014 b 1000 $2,000.20

2 100006 18/05/2014 b 2000 $3,000.20

2 100007 20/09/2013 s 1000 $3,500.20

;

run;

/***** REMOVE DUPLICATE RECORDS OF STORE *****/

proc sort data=test nodupkey

out=test_unique_store;

by store;

run;

%macro prod(dsn);

data _null_;

set test_unique_store end=last; /***** USE UNIQUE STORE DATA SET *****/

call symputx(cats("store",_n_),store);

If last then call symputx("n",_n_);

run;

%do i=1 %to &n; /***** &N IS THE VALUE OF THE NUMBER OF UNIQUE STORES (IN THIS EXAMPLE 1 AND 2) *****/

data mypro.work&i;

set &dsn;

where store=&&store&i;

run; %end;

%mend;

%prod(test)

gergely_batho
SAS Employee

Hi,

Your original code is almost good (if store numbers are 1.. N consecutive integers).

Change this:

%where store='i';

to:

where store="&i"; /*if store is a string*/

or:

where store=&i;  /*if store is a number*/


You don't need the distinct keyword in the select statement (but it does not hurt in this case).


And I agree with previous posts: don't do it, unless you have a very good reason for it.

Why do you want to do this? Maybe we can offer a better solution then splitting the data set.

johnsaini
Calcite | Level 5

Thank you very much its working now

Ron_MacroMaven
Lapis Lazuli | Level 10

ok, so perhaps store-id is not a single-digit integer in sequential order with none skipped.

http://www.sascommunity.org/wiki/List_Processing_Basics_Creating_and_Using_Lists_of_Macro_Variables

1. move the processing statements into a separate macro so you can test it

2. you're learning sql so make the list with sql select distinct store into :list_stores separated by ' ';

3. the loop reads an item from the list

and calls the processing macro

%macro do_this(store=);

data mypro.work&store;

set &dsn(where store=&store);

run;

%mend;

%macro prod(dsn);

proc sql;

select distinct store into :list_stores from &dsn;

quit;

%do i=1 %to &sqlobs;

%let item = %scan(&list_stores,&i);

%put echo &=item;

%put do_this(store=&item);

%end;

%mend prod;

Tommywhosc
Obsidian | Level 7

You can put *all* the store IDs in a macro variable, then %SCAN that list, pulling the ID's off one-at-a-time:

(This assumes the IDs are all integers)

data one;
attrib  date informat=ddmmyy10. format=date9.
  buysell informat=$1.
  price informat=dollar9.2 format=dollar9.2
;
input  store product date   buysell   units price   ;
cards;
1 100001 19/02/2013 b 1000 $1,000.29
1 100001 20/03/2013 s 1000 $2,500.20
1 100002 17/03/2014 b 1000 $2,000.20
1 100003 18/05/2014 b 2000 $3,000.20
2 100004 14/05/2013 b 1000 $3,000.29
2 100004 20/06/2013 s 1000 $3,500.20
2 100005 17/03/2014 b 1000 $2,000.20
2 100006 18/05/2014 b 2000 $3,000.20
2 100007 20/09/2013 s 1000 $3,500.20
9 123456 01/01/2001 b 9999 $1,234.56
39 33333 03/03/2003 x 3333 $3,333.33
run;
proc print; run;

options mprint mlogic symbolgen ;
%macro prod2(dsn);
proc sql;
select distinct store into :xall separated by ' ' from &dsn;
quit;
%put   &xall= ;

%let I = 1;
%do %while (%scan(&xall,&i) ne );
%let ID = %scan(&xall,&i); %put &ID= ;
data work.work_&id;
set &dsn(where=(store=%eval(&id)));
run;
%let I = %eval(&I + 1);
%end;

%mend prod2;

%prod2(work.one)

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Yes, I see this approach a lot.  I personally don't like the macro language involved with macro variable lists, and there are limitations to the length.  You can modify your code slightly to get the same effect without the limitations and macro code however:

proc sql;

     create table LOOP as select distinct STORE from DATASET;

quit;

data _null_;

     set loop;

     call execute('data want'||strip(store)||'; set have (where=(store="'||strip(store)||'")); run;');

run;

Quentin
Super User

Just to add another non-macro alternative (not that I think the macro language is a bad choice for this problem), I'm amazed by the magic of what can be done with the hash object.  Below hash-splitting is based on example from a Paul Dorfman paper, http://www.lexjansen.com/nesug/nesug09/hw/HW04.pdf

219  *data _null_ step that outputs multiple datatsets ;
220  data _null_  ;
221    if _n_=1 then do ;
222      declare hash h() ;
223      h.definekey ("_n_") ;
224      h.definedata ("store","product","date","buysell","units","price") ;
225      h.definedone () ;
226    end;
227    do _n_ = 1 by 1 until (last.store) ;
228      set have ;
229      by store ;
230      h.add() ;
231    end ;
232    h.output (dataset: cats("Store_",store) ) ;
233    h.clear() ;
234  run ;

NOTE: The data set WORK.STORE_1 has 4 observations and 6 variables.
NOTE: The data set WORK.STORE_2 has 5 observations and 6 variables.
NOTE: There were 9 observations read from the data set WORK.HAVE.

MadhuKorni
Quartz | Level 8

Hi John,

You can try the below code also.

Let us assume the input dataset as "storeData"

%Maco splittingDatasets(dsn=);

proc sql;

select count(distinct store) into :storeCnt from &dsn;

%let storeCount = %cmpres(&storeCnt);

select distinct store into :store1 - :store&storeCount from &dsn;

quit;

%do i = 1 %to &storeCount;

data store&i;

set &dsn;

where store = &&store&i;

run;

%end;

%mend;

%splittingDatasets(dsn = storeData )

Astounding
PROC Star

One more comment ...

You should realize that even if you succeed at this task, you are accomplishing it in an inefficient way.  This code takes longer to run:

data store1;

   set &dsn;

   where store='1';

run;

data store2;

   set &dsn;

   where store='2';

run;

This code would run faster:

data store1 store2;

   set &dsn;

   if store='1' then output store1;

   else if store='2' then output store2;

run;

Macro language could generate the faster-running program, as well as the slower-running program.  It's perhaps a bit more complex to generate the faster-running program, so if this is really just a learning exercise it may be too much too soon.  And if your data sets are small, speed would be a secondary consideration anyway.

Good luck.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 12 replies
  • 5336 views
  • 14 likes
  • 9 in conversation