obs store product date buysell units price
1 | 1 | 100001 | 19/02/2013 | b | 1000 | $1,000.29 |
---|---|---|---|---|---|---|
2 | 1 | 100001 | 20/03/2013 | s | 1000 | $2,500.20 |
3 | 1 | 100002 | 17/03/2014 | b | 1000 | $2,000.20 |
4 | 1 | 100003 | 18/05/2014 | b | 2000 | $3,000.20 |
5 | 2 | 100004 | 14/05/2013 | b | 1000 | $3,000.29 |
6 | 2 | 100004 | 20/06/2013 | s | 1000 | $3,500.20 |
7 | 2 | 100005 | 17/03/2014 | b | 1000 | $2,000.20 |
8 | 2 | 100006 | 18/05/2014 | b | 2000 | $3,000.20 |
9 | 2 | 100007 | 20/09/2013 | s | 1000 | $3,500.20 |
how can I split this data set into different number of datasets according to the store and name that datasets as work_i where i is the number of store?
This is my coding
%macro prod(dsn);
proc sql;
select distinct max(store) into:x from &dsn;
quit;
%do i=1 %to &x;
data mypro.work&i;
set &dsn;
%where store='i';
run;
%end;
%mend prod;
Hi,
Your original code is almost good (if store numbers are 1.. N consecutive integers).
Change this:
%where store='i';
to:
where store="&i"; /*if store is a string*/
or:
where store=&i; /*if store is a number*/
You don't need the distinct keyword in the select statement (but it does not hurt in this case).
And I agree with previous posts: don't do it, unless you have a very good reason for it.
Why do you want to do this? Maybe we can offer a better solution then splitting the data set.
Seems like a learning exercise. As a first step, I would suggest try writing two DATA steps that you would use to make mypro.Work1 and mypro.Work2. Then once you have those DATA steps, try updating the macro to generate those two steps. Remember, the macro language is mostly just a SAS code generator, so when learning macro language, it sometimes helps to type the SAS code yourself, and get it working, before using the macro language to generate code.
in this case we know that the maximum value of stores are 2, but how about if the value of store is a large number, then?
1. Don't do this. use by-processing instead
2. YeahBut for the learning --- making mistakes --- experience
read up on the history of the problem.
I would do some research into the call symputx macro function combined with the cats function on the auto SAS variable _N_. This will loop through a data set with the values equal to a specific variable (in this case store is what you'd want). I provided some code below, assuming that the store variable is numeric. If it's character then there needs to be double quotations around the &&store&i=>"&&store&i". I would recommend researching and trying to get an understanding of what the code is actually doing so you can manipulate and modify it moving forward as you continue programming.
Hope this helps!
data test;
input store product date buysell $ units price;
informat date ddmmyy10. price dollar20.2;
format date ddmmyy10. price dollar20.2;
datalines;
1 100001 19/02/2013 b 1000 $1,000.29
1 100001 20/03/2013 s 1000 $2,500.20
1 100002 17/03/2014 b 1000 $2,000.20
1 100003 18/05/2014 b 2000 $3,000.20
2 100004 14/05/2013 b 1000 $3,000.29
2 100004 20/06/2013 s 1000 $3,500.20
2 100005 17/03/2014 b 1000 $2,000.20
2 100006 18/05/2014 b 2000 $3,000.20
2 100007 20/09/2013 s 1000 $3,500.20
;
run;
/***** REMOVE DUPLICATE RECORDS OF STORE *****/
proc sort data=test nodupkey
out=test_unique_store;
by store;
run;
%macro prod(dsn);
data _null_;
set test_unique_store end=last; /***** USE UNIQUE STORE DATA SET *****/
call symputx(cats("store",_n_),store);
If last then call symputx("n",_n_);
run;
%do i=1 %to &n; /***** &N IS THE VALUE OF THE NUMBER OF UNIQUE STORES (IN THIS EXAMPLE 1 AND 2) *****/
data mypro.work&i;
set &dsn;
where store=&&store&i;
run; %end;
%mend;
%prod(test)
Hi,
Your original code is almost good (if store numbers are 1.. N consecutive integers).
Change this:
%where store='i';
to:
where store="&i"; /*if store is a string*/
or:
where store=&i; /*if store is a number*/
You don't need the distinct keyword in the select statement (but it does not hurt in this case).
And I agree with previous posts: don't do it, unless you have a very good reason for it.
Why do you want to do this? Maybe we can offer a better solution then splitting the data set.
Thank you very much its working now
ok, so perhaps store-id is not a single-digit integer in sequential order with none skipped.
http://www.sascommunity.org/wiki/List_Processing_Basics_Creating_and_Using_Lists_of_Macro_Variables
1. move the processing statements into a separate macro so you can test it
2. you're learning sql so make the list with sql select distinct store into :list_stores separated by ' ';
3. the loop reads an item from the list
and calls the processing macro
%macro do_this(store=);
data mypro.work&store;
set &dsn(where store=&store);
run;
%mend;
%macro prod(dsn);
proc sql;
select distinct store into :list_stores from &dsn;
quit;
%do i=1 %to &sqlobs;
%let item = %scan(&list_stores,&i);
%put echo &=item;
%put do_this(store=&item);
%end;
%mend prod;
You can put *all* the store IDs in a macro variable, then %SCAN that list, pulling the ID's off one-at-a-time:
(This assumes the IDs are all integers)
data one;
attrib date informat=ddmmyy10. format=date9.
buysell informat=$1.
price informat=dollar9.2 format=dollar9.2
;
input store product date buysell units price ;
cards;
1 100001 19/02/2013 b 1000 $1,000.29
1 100001 20/03/2013 s 1000 $2,500.20
1 100002 17/03/2014 b 1000 $2,000.20
1 100003 18/05/2014 b 2000 $3,000.20
2 100004 14/05/2013 b 1000 $3,000.29
2 100004 20/06/2013 s 1000 $3,500.20
2 100005 17/03/2014 b 1000 $2,000.20
2 100006 18/05/2014 b 2000 $3,000.20
2 100007 20/09/2013 s 1000 $3,500.20
9 123456 01/01/2001 b 9999 $1,234.56
39 33333 03/03/2003 x 3333 $3,333.33
run;
proc print; run;
options mprint mlogic symbolgen ;
%macro prod2(dsn);
proc sql;
select distinct store into :xall separated by ' ' from &dsn;
quit;
%put &xall= ;
%let I = 1;
%do %while (%scan(&xall,&i) ne );
%let ID = %scan(&xall,&i); %put &ID= ;
data work.work_&id;
set &dsn(where=(store=%eval(&id)));
run;
%let I = %eval(&I + 1);
%end;
%mend prod2;
%prod2(work.one)
Yes, I see this approach a lot. I personally don't like the macro language involved with macro variable lists, and there are limitations to the length. You can modify your code slightly to get the same effect without the limitations and macro code however:
proc sql;
create table LOOP as select distinct STORE from DATASET;
quit;
data _null_;
set loop;
call execute('data want'||strip(store)||'; set have (where=(store="'||strip(store)||'")); run;');
run;
Just to add another non-macro alternative (not that I think the macro language is a bad choice for this problem), I'm amazed by the magic of what can be done with the hash object. Below hash-splitting is based on example from a Paul Dorfman paper, http://www.lexjansen.com/nesug/nesug09/hw/HW04.pdf
219 *data _null_ step that outputs multiple datatsets ; 220 data _null_ ; 221 if _n_=1 then do ; 222 declare hash h() ; 223 h.definekey ("_n_") ; 224 h.definedata ("store","product","date","buysell","units","price") ; 225 h.definedone () ; 226 end; 227 do _n_ = 1 by 1 until (last.store) ; 228 set have ; 229 by store ; 230 h.add() ; 231 end ; 232 h.output (dataset: cats("Store_",store) ) ; 233 h.clear() ; 234 run ; NOTE: The data set WORK.STORE_1 has 4 observations and 6 variables. NOTE: The data set WORK.STORE_2 has 5 observations and 6 variables. NOTE: There were 9 observations read from the data set WORK.HAVE.
Hi John,
You can try the below code also.
Let us assume the input dataset as "storeData"
%Maco splittingDatasets(dsn=);
proc sql;
select count(distinct store) into :storeCnt from &dsn;
%let storeCount = %cmpres(&storeCnt);
select distinct store into :store1 - :store&storeCount from &dsn;
quit;
%do i = 1 %to &storeCount;
data store&i;
set &dsn;
where store = &&store&i;
run;
%end;
%mend;
%splittingDatasets(dsn = storeData )
One more comment ...
You should realize that even if you succeed at this task, you are accomplishing it in an inefficient way. This code takes longer to run:
data store1;
set &dsn;
where store='1';
run;
data store2;
set &dsn;
where store='2';
run;
This code would run faster:
data store1 store2;
set &dsn;
if store='1' then output store1;
else if store='2' then output store2;
run;
Macro language could generate the faster-running program, as well as the slower-running program. It's perhaps a bit more complex to generate the faster-running program, so if this is really just a learning exercise it may be too much too soon. And if your data sets are small, speed would be a secondary consideration anyway.
Good luck.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.