Hello,
I need to peform a do loop over a string variable (disease name and region) and use proc sql, data step, and proc reg within that loop. The following shows what I need:
DO LOOP OVER EACH DISEASE AND REGION; proc sql; create table a as select week, year, count from table
where region="®ion" and disease="&disease" quit; data datastep; /*do some stuff*/ run; proc reg data=statprocess; model avg=x0; output out=trig p=yhat r=e stdp=sd; run; END;
I would be appriciate if someone help me.
Thank you for sharing your knowledge
Samira
Wrap your code in a macro:
%macro analysis(region,disease);
/* your code */
%mend;
Then get a list of distinct value pairs from the source dataset:
proc sql;
create table control as
select distinct region, disease
from table;
quit;
Then use that to repeatedly execute the macro:
data _null_;
set control;
call execute('%analysis('!!strip(region)!!','!!strip(disease)!!')');
run;
Why not use BY groups instead?
Wrap your code in a macro:
%macro analysis(region,disease);
/* your code */
%mend;
Then get a list of distinct value pairs from the source dataset:
proc sql;
create table control as
select distinct region, disease
from table;
quit;
Then use that to repeatedly execute the macro:
data _null_;
set control;
call execute('%analysis('!!strip(region)!!','!!strip(disease)!!')');
run;
Thank you very much for your response.
I modified my code based on your comment. Here is the code:
%macro analysis(dise, regi);
proc sql;
create table withState as
select week, year, sum(count) from data.datasource20012017filtered
where disease="&dise"
group by week, year
union
select * from data.mtx3;
quit;
proc sql;
create table withoutState as
select week, year, count from data.datasource20012017filtered
where region="®i" and disease="&dise"
union
select * from data.mtx3;
quit;
data main_query;
if "®i"="STATE" then
set withState;
else set withoutState;
run;
proc sql;
create table sumup_count as
select distinct week, year , sum(count) as count
from main_query
group by week, year;
quit;
proc sort data=sumup_count;
by week year;
run;
*using the following proc transpose, we are gonna make the matrix;
proc transpose data=sumup_count out=want (DROP=_LABEL_ AND _NAME_) prefix=year ;
by week;
id year;
var count;
run;
data the_matrix_query_withnull;
retain week year2001 year2002 year2003 year2004 year2005 year2006 year2007 year2008 year2009 year2010 year2011 year2012 year2013 year2014 year2015 year2016 year2017;
set want;
run;
*the additional below data step is because we handel
the null and zero issue in the recomputation limits part.
in fact we need to put zero instead of null on missing value;
data the_matrix_query;
set the_matrix_query_withnull;
if year2001=" " then year2001=0;
if year2002=" " then year2002=0;
if year2003=" " then year2003=0;
if year2004=" " then year2004=0;
if year2005=" " then year2005=0;
if year2006=" " then year2006=0;
if year2007=" " then year2007=0;
if year2008=" " then year2008=0;
if year2009=" " then year2009=0;
if year2010=" " then year2010=0;
if year2011=" " then year2011=0;
if year2012=" " then year2012=0;
if year2013=" " then year2013=0;
if year2014=" " then year2014=0;
if year2015=" " then year2015=0;
if year2016=" " then year2016=0;
if year2017=" " then year2017=0;
run;
data statprocess;
set the_matrix_query;
avg=(sum(of year2007-year2016))/10;
x0=1;
x1=sin(2*3.14159*week/52);
x2=cos(2*3.14159*week/52);
x3=sin(4*3.14159*week/52);
x4=cos(4*3.14159*week/52);
run;
proc reg data=statprocess;
model avg=x0;
output out=trig p=yhat r=e stdp=sd;
run;
%mend analysis;
%analysis
proc sql;
create table control as
select distinct region, disease from Report8;
quit;
data _null_;
set control;
call execute('%analysis('!!strip(region)!!','!!strip(disease)!!')');
run;
as you can see there is a lot to execute in macro analysis. I ran the code and I got the following error(20):
Re: Macro Error: More positional parameters found than defined.
also the running time its about 5 min which is too much. is there any way to do make this runing time lower?
Thank you for your help in advance.
Samira
When writing a macro, first make sure the code inside works. As written it shouldn't.
This is incorrect:
data main_query;
if "®i"="STATE" then
set withState;
else set withoutState;
run;
If you need conditional logic, then it's %if/%then/%else etc.
In your macro call, via call execute, build the string first and then pass it to call execute. This will help you avoid errors.
You can use PROC STDIZE to set the missing to 0, rather than a data step, if its easier. It's dynamic so it's easier to modify if you have multiple years. Your IF/THEN in that step is also incorrect, because you're checking for missing with a blank - which means a character variable, but then you assign a numeric value instead of character value. Enclose the 0 in quotes. If you don't want PROC STDIZE, use an array instead.
There are mistakes throughout so I go back to my first suggestion - make sure it's working without macro variables first, or hard coded macro variables. Then make it into a macro. Your base code will not be correct as shown.
Other common mistakes I see after another quick skim - all output tables will have the same name regardless of the run. This means you overwrite previous results if you run the macro more than once.
Also, like I first mentioned, you don't need a macro for this at all. It's likely making it harder to accomplish really. For academic reasons, learning, it may be worth fixing, but learning how to do it without macro's would be more useful in the long run. My 2 cents.
You probably have a region or disease with a comma in it.
@samira wrote:
Thanks KurtBremser for your nice answer. I just have another question. if in each iteration I want to store the result into a dataset (incrementally). how can I gather all the result in one dataset?
You have this output statement:
output out=trig /* further options */;
Expand it with your macro parameters:
output out=trig_®ion._&disease. /* further options */;
Now, to put everything into one dataset:
data _null_;
set control end=done;
if _n_ = 1
then do;
call execute('data all_results; set');
end;
call execute('trig_'!!strip(region)!!'_'!!strip(disease)!!' ');
if done
then do;
call execute(';run;');
end;
run;
Thanks a lot for the inforemation. I used this code to get the value of each iteration. I have tried to understand your code and utilize it in my code but I got some error which does not make sense to me. here is the first error:
NOTE: CALL EXECUTE generated line.
1 + data all_results; set
NOTE: Line generated by the CALL EXECUTE routine.
2 + trig_CENTRAL_ADULT RESPIRATORY DISTRESS SYNDROME (ARDS)
____
22
ERROR 22-7: Invalid option name ARDS.
3 + trig_CENTRAL_ANAPLASMA PHAGOCYTOPHILUM
4 + trig_CENTRAL_ANIMAL BITES
5 + trig_CENTRAL_BOTULISM INFANT
6 + trig_CENTRAL_BRUCELLOSIS
7 + trig_CENTRAL_CAMPYLOBACTERIOSIS
8 + trig_CENTRAL_CHIKUNGUNYA
9 + trig_CENTRAL_COCCIDIOIDOMYCOSIS
NOTE: Line generated by the CALL EXECUTE routine.
10 + trig_CENTRAL_CREUTZFELDT-JAKOB DISEASE (CJD)
___
22
ERROR: Missing numeric suffix on a numbered data set list (WORK.trig_CENTRAL_CREUTZFELDT-WORK.JAKOB).
ERROR 22-7: Invalid option name CJD.
11 + trig_CENTRAL_CRYPTOSPORIDIOSIS
12 + trig_CENTRAL_CYCLOSPORIASIS
13 + trig_CENTRAL_DENGUE FEVER
14 + trig_CENTRAL_E. COLI O157 H7
ERROR: Libref 'trig_CENTRAL_E' exceeds 8 characters.
15 + trig_CENTRAL_EHRLICHIA CHAFFEENSIS
16 + trig_CENTRAL_EHRLICHIA EWINGII
17 + trig_CENTRAL_EHRLICHIOSIS ANAPLASMOSIS UNDETERMINED
18 + trig_CENTRAL_GIARDIASIS
NOTE: Line generated by the CALL EXECUTE routine.
19 + trig_CENTRAL_HAEMOPHILUS INFLUENZAE, INVASIVE
_
22
200
ERROR 22-322: Syntax error, expecting one of the following: a name, a quoted string, (, -, :, ;, CUROBS, END, INDSNAME, KEY,
KEYRESET, KEYS, NOBS, OPEN, POINT, _DATA_, _LAST_, _NULL_.
ERROR 200-322: The symbol is not recognized and will be ignored.
and the following is my code which is edited based on your comments:
%macro analysis(dise, regi);
proc sql;
create table withState as
select week, year, sum(count) from data.datasource20012017filtered
where disease="&dise"
group by week, year
union
select * from data.mtx3;
quit;
proc sql;
create table withoutState as
select week, year, count from data.datasource20012017filtered
where region="®i" and disease="&dise"
union
select * from data.mtx3;
quit;
proc sql;
create table sumup_count as
select distinct week, year , sum(count) as count
from withoutState
group by week, year;
quit;
proc sort data=sumup_count;
by week year;
run;
*using the following proc transpose, we are gonna make the matrix;
proc transpose data=sumup_count out=want (DROP=_LABEL_ AND _NAME_) prefix=year ;
by week;
id year;
var count;
run;
data the_matrix_query_withnull;
retain week year2001 year2002 year2003 year2004 year2005 year2006 year2007 year2008 year2009 year2010 year2011 year2012 year2013 year2014 year2015 year2016 year2017;
set want;
run;
*the additional below data step is because we handel
the null and zero issue in the recomputation limits part.
in fact we need to put zero instead of null on missing value;
data the_matrix_query;
set the_matrix_query_withnull;
if year2001=" " then year2001=0;
if year2002=" " then year2002=0;
if year2003=" " then year2003=0;
if year2004=" " then year2004=0;
if year2005=" " then year2005=0;
if year2006=" " then year2006=0;
if year2007=" " then year2007=0;
if year2008=" " then year2008=0;
if year2009=" " then year2009=0;
if year2010=" " then year2010=0;
if year2011=" " then year2011=0;
if year2012=" " then year2012=0;
if year2013=" " then year2013=0;
if year2014=" " then year2014=0;
if year2015=" " then year2015=0;
if year2016=" " then year2016=0;
if year2017=" " then year2017=0;
run;
data statprocess;
set the_matrix_query;
avg=(sum(of year2007-year2016))/10;
x0=1;
x1=sin(2*3.14159*week/52);
x2=cos(2*3.14159*week/52);
x3=sin(4*3.14159*week/52);
x4=cos(4*3.14159*week/52);
run;
/* %if ( ®ressionModel = 0 ) %then
%do;*/
* Zero order model u = a0 + a1*t;
proc reg data=statprocess;
model avg=x0;
output out=trig_®i._&dise. p=yhat r=e stdp=sd;
run;
%mend analysis;
%analysis
proc sql;
create table control as
select distinct region, disease from Report8;
quit;
data _null_;
set control end=done;
if _n_ = 1
then do;
call execute('data all_results; set');
end;
call execute('trig_'!!strip(region)!!'_'!!strip(disease)!!' ');
if done
then do;
call execute(';run;');
end;
run;
appriciate all your helps in advance.
Samira
You need to get rid of blanks in your region and disease variables. Replace them with underlines, using the translate() function.
Dear KurtBremser,
Over te last couple of days I was struggling to fix my code and get result based on your comments. your solution is intresting to me and I am sure this is the best way to do loop over list of disease name and regions. However I have still problem which I coudnt found out the reason over last 2 days. The problem is I couldnt get each loop's output and save into one data set. the below code gave me only one row. Please see my comments in the code.
the following is my code after editing.
data compres;
set data.datasource20012017filtered;
disease=tranwrd(disease, ",", " ");
disease=tranwrd(disease, "<", " ");
disease=tranwrd(disease, "(", " ");
disease=tranwrd(disease, ")", " ");
disease=tranwrd(disease, ".", " ");
disease=tranwrd(disease, "-", " ");
disease=compress(disease);
disease=substr(disease,1,min(length(disease),16));
run;
proc sql;
create table control as
select distinct "EASTERN" as region, disease from Report8
union select distinct "CENTRAL" as region, disease from Report8
union select distinct "SOUTHWEST" as region, disease from Report8
union select distinct "SOUTHEAST" as region, disease from Report8
union select distinct "NORTHWEST" as region, disease from Report8;
quit;
data compres2;
set control;
disease=tranwrd(disease, ",", " ");
disease=tranwrd(disease, "<", " ");
disease=tranwrd(disease, "(", " ");
disease=tranwrd(disease, ")", " ");
disease=tranwrd(disease, ".", " ");
disease=tranwrd(disease, "-", " ");
disease=compress(disease);
disease=substr(disease,1,min(length(disease),16));
run;
%macro analysis(regi,dise,counter);
%put ®i;
%put &dise;
%put &counter;
proc sql;
create table withoutState as
select week, year, count from compres
where region="®i" and disease="&dise"
union
select * from data.mtx3;
quit;
proc sql;
create table sumup_count as
select distinct week, year , sum(count) as count
from withoutState
group by week, year;
quit;
proc sort data=sumup_count;
by week year;
run;
*using the following proc transpose, we are gonna make the matrix;
proc transpose data=sumup_count out=want (DROP=_LABEL_ AND _NAME_) prefix=year ;
by week;
id year;
var count;
run;
data the_matrix_query_withnull;
retain week year2001 year2002 year2003 year2004 year2005 year2006 year2007 year2008 year2009 year2010 year2011 year2012 year2013 year2014 year2015 year2016 year2017;
set want;
run;
*the additional below data step is because we handel
the null and zero issue in the recomputation limits part.
in fact we need to put zero instead of null on missing value;
data the_matrix_query;
set the_matrix_query_withnull;
if year2001=" " then year2001=0;
if year2002=" " then year2002=0;
if year2003=" " then year2003=0;
if year2004=" " then year2004=0;
if year2005=" " then year2005=0;
if year2006=" " then year2006=0;
if year2007=" " then year2007=0;
if year2008=" " then year2008=0;
if year2009=" " then year2009=0;
if year2010=" " then year2010=0;
if year2011=" " then year2011=0;
if year2012=" " then year2012=0;
if year2013=" " then year2013=0;
if year2014=" " then year2014=0;
if year2015=" " then year2015=0;
if year2016=" " then year2016=0;
if year2017=" " then year2017=0;
run;
data statprocess;
set the_matrix_query;
avg=(sum(of year2007-year2016))/10;
x0=1;
x1=sin(2*3.14159*week/52);
x2=cos(2*3.14159*week/52);
x3=sin(4*3.14159*week/52);
x4=cos(4*3.14159*week/52);
run;
proc reg data=statprocess;
model avg=x0;
output out=trig_®i._&dise. p=yhat r=e stdp=sd;
run;
data _null_;
*set compres2 ; ---> here I removed setting onother data set because it goes through each row for each call execute analysis (there is a call execure outside to run the whole macro multiple times)
%put &counter;
if &counter= 1
then do;
call execute('data all_results; set ');
end;
call execute('trig_'!!strip("®i")!!'_'!!strip("&dise")!!'');
if &counter=73
then do;
%put "samira"; --> here I found out that "samira" repeated 73 times that mean something is wrong because 73 is the amount of rows that I have so it should just execute only one time.
call execute(';run;');
end;
run;
%mend;
data _null_;
set compres2;
_n_=1;
counter=_n_;
call execute('%analysis('!!%str(strip(region))!!','!!%str(strip(disease))!!','!!strip(counter)!!');');
run;
proc print data=all_results(where=(week=3)); --> here I just got one row data. I expected to see all the outputs which come from each loop.
run;
I really appriciate all your helps.
Samira
What we need right now is a set of test data against which we can run the code.
Post some, in the form of a data step with datalines.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.