Hi,
I am wanting to use a macro to create datasets based on a variable within the input dataset. In this case the variable is year. I am not sure how to fix the code so it won't produce the errors. Something else that is confusing to me is the code seems to work for iterations 1 and 2 but then breaks on iteraton 3.
A few key bits of context. I had to create the example data as the real data is something I cannot post, but my actually data has over 6,000 records. The variable "Total" is the number of records that "Place" has sent in for a given year. What I am trying to do is create a program that will generate a report of the top 10 to 20 Places by number of records sent in per year. A person will be able to type in a %let macro variable to have their report start at whatever year they choose, and this macro variable will limit the years included in the dataset called "test" for the example below. The datasets created by the macro split below would then be sorted by "total" and the top 10 or so places and their totals per year would be put in the final/desired report. I hope that makes sense.
I am using SAS 9.4
Here is the code for the data:
data WORK.TEST;
infile datalines dsd truncover;
input Total:BEST. Place:$7. year:BEST.;
format Total BEST. year BEST.;
label Total="Total" Place="Place" year="year";
datalines;
1 doc a 2010
3 doc a 2011
3 doc a 2011
1 doc a 2012
1 doc a 2014
2 doc a 2011
1 doc b 2012
1 doc b 2013
2 doc b 2010
2 doc b 2010
1 doc b 2014
2 doc c 2011
2 doc c 2011
1 doc g 2013
1 doc g 2011
1 hop b 2013
1 hosp a 2011
1 hosp a 2013
1 hosp a 2012
2 hosp b 2013
1 hosp b 2010
2 hosp b 2013
2 hosp f 2013
2 hosp f 2013
;;;;
Here is the copy of my log file with the error messages
221 %macro split (data=,var=);
222 proc sort data=&data(keep=&var) out=years nodupkey;
223 by &var;
224 run;
225
226 data _null_;
227 set years end=last;
228 call symputx('year'||left(_n_),&var);
229 if last then call symputx('num',_n_);
230 run;
231
232 data
233 %do i=1 %to #
234 &&year&i
235 %end;
236 ;
237 set &data;
238 select(&var);
239 %do i=1 %to #
240 when("&&year&i") output &&year&i;
241 %end;
242 otherwise;
243 end;
244 run;
245 %put #
246 %mend split;
247
248 %split(data=test, var=year)
MLOGIC(SPLIT): Beginning execution.
MLOGIC(SPLIT): Parameter DATA has value test
MLOGIC(SPLIT): Parameter VAR has value year
SYMBOLGEN: Macro variable DATA resolves to test
SYMBOLGEN: Macro variable VAR resolves to year
MPRINT(SPLIT): proc sort data=test(keep=year) out=years nodupkey;
SYMBOLGEN: Macro variable VAR resolves to year
MPRINT(SPLIT): by year;
MPRINT(SPLIT): run;
NOTE: Compression was disabled for data set WORK.YEARS because compression overhead would increase the size of the data set.
NOTE: There were 24 observations read from the data set WORK.TEST.
NOTE: 19 observations with duplicate key values were deleted.
NOTE: The data set WORK.YEARS has 5 observations and 1 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
MPRINT(SPLIT): data _null_;
MPRINT(SPLIT): set years end=last;
SYMBOLGEN: Macro variable VAR resolves to year
MPRINT(SPLIT): call symputx('year'||left(_n_),year);
MPRINT(SPLIT): if last then call symputx('num',_n_);
MPRINT(SPLIT): run;
NOTE: Numeric values have been converted to character values at the places given by: (Line):(Column).
813:143
NOTE: There were 5 observations read from the data set WORK.YEARS.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
SYMBOLGEN: Macro variable NUM resolves to 5
MLOGIC(SPLIT): %DO loop beginning; index variable I; start value is 1; stop value is 5; by value is 1.
SYMBOLGEN: && resolves to &.
SYMBOLGEN: Macro variable I resolves to 1
SYMBOLGEN: Macro variable YEAR1 resolves to 2010
MLOGIC(SPLIT): %DO loop index variable I is now 2; loop will iterate again.
SYMBOLGEN: && resolves to &.
SYMBOLGEN: Macro variable I resolves to 2
SYMBOLGEN: Macro variable YEAR2 resolves to 2011
MLOGIC(SPLIT): %DO loop index variable I is now 3; loop will iterate again.
SYMBOLGEN: && resolves to &.
SYMBOLGEN: Macro variable I resolves to 3
SYMBOLGEN: Macro variable YEAR3 resolves to 2012
NOTE 138-205: Line generated by the macro variable "YEAR1".
1 2010
----
22
ERROR 22-322: Syntax error, expecting one of the following: a name, a quoted string, /, ;, _DATA_, _LAST_, _NULL_.
MLOGIC(SPLIT): %DO loop index variable I is now 4; loop will iterate again.
SYMBOLGEN: && resolves to &.
SYMBOLGEN: Macro variable I resolves to 4
SYMBOLGEN: Macro variable YEAR4 resolves to 2013
MLOGIC(SPLIT): %DO loop index variable I is now 5; loop will iterate again.
SYMBOLGEN: && resolves to &.
SYMBOLGEN: Macro variable I resolves to 5
SYMBOLGEN: Macro variable YEAR5 resolves to 2014
NOTE: Line generated by the macro variable "YEAR1".
1 2010
----
76
ERROR 76-322: Syntax error, statement will be ignored.
MLOGIC(SPLIT): %DO loop index variable I is now 6; loop will not iterate again.
MPRINT(SPLIT): data 2010 2011 2012 2013 2014 ;
SYMBOLGEN: Macro variable DATA resolves to test
MPRINT(SPLIT): set test;
SYMBOLGEN: Macro variable VAR resolves to year
MPRINT(SPLIT): select(year);
SYMBOLGEN: Macro variable NUM resolves to 5
MLOGIC(SPLIT): %DO loop beginning; index variable I; start value is 1; stop value is 5; by value is 1.
SYMBOLGEN: && resolves to &.
SYMBOLGEN: Macro variable I resolves to 1
SYMBOLGEN: Macro variable YEAR1 resolves to 2010
SYMBOLGEN: && resolves to &.
SYMBOLGEN: Macro variable I resolves to 1
SYMBOLGEN: Macro variable YEAR1 resolves to 2010
NOTE: Line generated by the macro variable "YEAR1".
1 2010
----
22
200
MPRINT(SPLIT): when("2010") output 2010;
MLOGIC(SPLIT): %DO loop index variable I is now 2; loop will iterate again.
SYMBOLGEN: && resolves to &.
SYMBOLGEN: Macro variable I resolves to 2
SYMBOLGEN: Macro variable YEAR2 resolves to 2011
SYMBOLGEN: && resolves to &.
SYMBOLGEN: Macro variable I resolves to 2
SYMBOLGEN: Macro variable YEAR2 resolves to 2011
NOTE: Line generated by the macro variable "YEAR2".
1 2011
----
22
200
MPRINT(SPLIT): when("2011") output 2011;
MLOGIC(SPLIT): %DO loop index variable I is now 3; loop will iterate again.
SYMBOLGEN: && resolves to &.
SYMBOLGEN: Macro variable I resolves to 3
SYMBOLGEN: Macro variable YEAR3 resolves to 2012
SYMBOLGEN: && resolves to &.
SYMBOLGEN: Macro variable I resolves to 3
SYMBOLGEN: Macro variable YEAR3 resolves to 2012
NOTE: Line generated by the macro variable "YEAR3".
1 2012
----
22
200
MPRINT(SPLIT): when("2012") output 2012;
MLOGIC(SPLIT): %DO loop index variable I is now 4; loop will iterate again.
SYMBOLGEN: && resolves to &.
SYMBOLGEN: Macro variable I resolves to 4
SYMBOLGEN: Macro variable YEAR4 resolves to 2013
SYMBOLGEN: && resolves to &.
SYMBOLGEN: Macro variable I resolves to 4
SYMBOLGEN: Macro variable YEAR4 resolves to 2013
NOTE: Line generated by the macro variable "YEAR4".
1 2013
----
22
200
MPRINT(SPLIT): when("2013") output 2013;
MLOGIC(SPLIT): %DO loop index variable I is now 5; loop will iterate again.
SYMBOLGEN: && resolves to &.
SYMBOLGEN: Macro variable I resolves to 5
SYMBOLGEN: Macro variable YEAR5 resolves to 2014
SYMBOLGEN: && resolves to &.
SYMBOLGEN: Macro variable I resolves to 5
SYMBOLGEN: Macro variable YEAR5 resolves to 2014
NOTE: Line generated by the macro variable "YEAR5".
1 2014
----
22
200
MPRINT(SPLIT): when("2014") output 2014;
MLOGIC(SPLIT): %DO loop index variable I is now 6; loop will not iterate again.
MPRINT(SPLIT): otherwise;
MPRINT(SPLIT): end;
MPRINT(SPLIT): run;
ERROR 22-322: Syntax error, expecting one of the following: a name, a quoted string, ;, RC, _DATA_, _LAST_, _NULL_.
ERROR 200-322: The symbol is not recognized and will be ignored.
NOTE: Character values have been converted to numeric values at the places given by: (Line):(Column).
1:1 1:1 1:1 1:1 1:1
NOTE: The SAS System stopped processing this step because of errors.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
MLOGIC(SPLIT): %PUT &num
SYMBOLGEN: Macro variable NUM resolves to 5
5
MLOGIC(SPLIT): Ending execution.
And finally a copy of the code:
%macro split (data=,var=);
proc sort data=&data(keep=&var) out=years nodupkey;
by &var;
run;
data _null_;
set years end=last;
call symputx('year'||left(_n_),&var);
if last then call symputx('num',_n_);
run;
data
%do i=1 %to #
&&year&i
%end;
;
set &data;
select(&var);
%do i=1 %to #
when("&&year&i") output &&year&i;
%end;
otherwise;
end;
run;
%put #
%mend split;
%split(data=test, var=year)
Please try the below code, i made few changes to the macro if you run the code it will help.
data WORK.TEST;
input Total Place$ year;
format Total BEST. year BEST.;
datalines;
1 doca 2010
3 doca 2011
3 doca 2011
1 doca 2012
1 doca 2014
2 doca 2011
1 docb 2012
1 docb 2013
2 docb 2010
2 docb 2010
1 docb 2014
2 docc 2011
2 docc 2011
1 docg 2013
1 docg 2011
1 hopb 2013
1 hospa 2011
1 hospa 2013
1 hospa 2012
2 hospb 2013
1 hospb 2010
2 hospb 2013
2 hospf 2013
2 hospf 2013
;;;;
%macro split (data=,var=);
proc sort data=&data(keep=&var) out=years nodupkey;
by &var;
run;
data _null_;
set years end=last;
call symputx('year'||left(_n_),&var);
if last then call symputx('num',_n_);
run;
data
%do i=1 %to #
_&&year&i
%end;
;
set &data;
%do i=1 %to #
if year in (&&year&i) then output _&&year&i;
%end;
run;
%put #
%mend split;
%split(data=test, var=year)
I know you are new to SAS and new to the SAS communities, so forgive me if I sound like lecturing, but there are a number of basic mistakes here, and I'm hoping to point you in the right direction and help you avoid these problems in the future by pointing these out.
First, you want to create valid working SAS code for one or two iterations of this problem without any macros and without any macro variables. Had you done that, you would see that data set names cannot be the year numbers because SAS data set names must begin with a letter or underscore, they cannot begin with a digit; this is not valid working SAS code. But that's what your macro creates, and that is the cause of the error. So please, from now on, follow this advice, creating working code without macros for one or two iterations; and avoid this type of problem.
Next, it is rarely a good idea to split data sets this way, as there are other structures in SAS, such as the BY statement, and the WHERE= option, to allow you to access the years individually without going through all of this to split the data set up. In the rare case where you data set is huge (and 6000 records is not what I call huge, I'm thinking 60 million records) and you might run out of memory or disk space, then perhaps this is a reasonable approach. So please don't do this splitting of data into yearly data sets without good justification. In fact, you might describe what you plan to do with this data, and we could probably give you very good advice on how to do whatever you need in a better way than splitting data into yearly data sets.
Lastly, when providing us with code and especially when you are providing the log file, PLEASE first click on the {i} icon and then paste the log into the window that appears. This preserves the formatting in your log, and makes it readable to us across the internet just as it is to you on your computer.
Hi,
Thank you for the advice. It is helpful. I see your point about not wanting to split dataset. I'm still learning the details about how macros work and how to apply them. So, your point about the dataset names created in the macro is another good one. Thanks.
The ultimate goal of the program is to provide a report that is generated by the user. The user picks the beginning year of the report, and the report would list the entities (called "Place" in my example) and the number of records sent in for each given year (called "Total" in my example) of the top 10-20 entities per year. Because the entities and the totals each year changes, the report would have to list a new group of top entities and their totals for each year. Given that splitting data is unwise, what is your suggestion?
The user picks the beginning year of the report, and the report would list the entities (called "Place" in my example) and the number of records sent in for each given year (called "Total" in my example) of the top 10-20 entities per year. Because the entities and the totals each year changes, the report would have to list a new group of top entities and their totals for each year. Given that splitting data is unwise, what is your suggestion?
So the user supplies YEAR which you use in a WHERE clause. The rest of the code should be the same, no matter what YEAR they picked.
Say you wanted the top 20 PLACE values within each YEAR. (so the which 20 could be different for each year).
proc freq data=have ;
where year >= &year ;
tables year*place / noprint out=counts;
run;
data top20 ;
set counts;
by year ;
if first.year then rank=0;
rank+1;
if rank <= 20;
run;
proc print ;
var year place rank count ;
run;
Thank you Tom for the helpful code.
You should always start with working code to generalize.
You do need to be using the BY group procesing and the SAS documentation actually has some examples on getting the top 10 records.
This is probably way more complicated than you need, so I'd recommend using @Tom solution instead.
If you need to build macros, again, start from working code.
Then convert using this approach:
https://github.com/statgeek/SAS-Tutorials/blob/master/Turning%20a%20program%20into%20a%20macro.md
Thank you Reeza, I have not heard of proc rank before. The info and the link are helpful.
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.