BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
samira
Obsidian | Level 7

Hello,

 

I need to peform a do loop over a string variable (disease name and region) and use proc sql, data step, and proc reg within that loop. The following shows what I need:

 

 

DO LOOP OVER EACH DISEASE AND REGION;

      proc sql;
      create table a  as select week, year, count from table 
where region="&region" and disease="&disease"     quit; data datastep;   /*do some stuff*/ run; proc reg data=statprocess; model avg=x0; output out=trig p=yhat r=e stdp=sd; run; END;

I would be appriciate if someone help me.

 

Thank you for sharing your knowledge

Samira

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

Wrap your code in a macro:

%macro analysis(region,disease);

/* your code */

%mend;

Then get a list of distinct value pairs from the source dataset:

proc sql;
create table control as
select distinct region, disease
from table;
quit;

Then use that to repeatedly execute the macro:

data _null_;
set control;
call execute('%analysis('!!strip(region)!!','!!strip(disease)!!')');
run;

View solution in original post

24 REPLIES 24
samira
Obsidian | Level 7
do you mean group by in sql? if so how can I execute all thoses code for each value ?
Kurt_Bremser
Super User

Wrap your code in a macro:

%macro analysis(region,disease);

/* your code */

%mend;

Then get a list of distinct value pairs from the source dataset:

proc sql;
create table control as
select distinct region, disease
from table;
quit;

Then use that to repeatedly execute the macro:

data _null_;
set control;
call execute('%analysis('!!strip(region)!!','!!strip(disease)!!')');
run;
samira
Obsidian | Level 7

Thank you very much for your response. 

 

I modified my code based on your comment. Here is the code: 

%macro analysis(dise, regi);

  proc sql;
      create table withState as
      select week, year, sum(count) from data.datasource20012017filtered
      where disease="&dise"
      group by week, year
      union 
      select * from data.mtx3;
   quit;
 
 proc sql;
      create table withoutState as
      select week, year, count from data.datasource20012017filtered
      where region="&regi" and disease="&dise"
      union 
      select * from data.mtx3;
   quit;

   data main_query;
   if "&regi"="STATE" then
   set withState;
   else set withoutState;
   run;
   
   proc sql;
   create table sumup_count as 
   select distinct week, year , sum(count) as count
   from main_query
   group by week, year;
   quit;
   
   proc sort data=sumup_count;
   by week year; 
   run;
   
  
 *using the following proc transpose, we are gonna make the matrix;
   proc transpose data=sumup_count out=want (DROP=_LABEL_ AND _NAME_) prefix=year ;
   by week;
   id year;
   var count;
   run;

   data the_matrix_query_withnull;
   retain week year2001 year2002 year2003 year2004 year2005 year2006 year2007 year2008 year2009 year2010 year2011 year2012 year2013 year2014 year2015 year2016 year2017;
   set want;
   run;
  
    *the additional below data step is because we handel 
  the null and zero issue in the recomputation limits part. 
  in fact we need to put zero instead of null on missing value;
   data the_matrix_query;
   set the_matrix_query_withnull;
   if year2001=" " then year2001=0;
   if year2002=" " then year2002=0;
   if year2003=" " then year2003=0;
   if year2004=" " then year2004=0;
   if year2005=" " then year2005=0;
   if year2006=" " then year2006=0;
   if year2007=" " then year2007=0;
   if year2008=" " then year2008=0;
   if year2009=" " then year2009=0;
   if year2010=" " then year2010=0;
   if year2011=" " then year2011=0;
   if year2012=" " then year2012=0;
   if year2013=" " then year2013=0;
   if year2014=" " then year2014=0;
   if year2015=" " then year2015=0;
   if year2016=" " then year2016=0;
   if year2017=" " then year2017=0;
   run;

data statprocess;
	set the_matrix_query;
    avg=(sum(of year2007-year2016))/10;
 	x0=1;
 	x1=sin(2*3.14159*week/52);
	x2=cos(2*3.14159*week/52); 
 	x3=sin(4*3.14159*week/52); 
 	x4=cos(4*3.14159*week/52);
run;
   
    
      proc reg data=statprocess;
         model avg=x0;
         output out=trig p=yhat r=e stdp=sd;
      run;

%mend analysis;
%analysis
 
 proc sql;
 create table control as
 select distinct region, disease from Report8;
 quit;
 
 data _null_;
 set control;
 call execute('%analysis('!!strip(region)!!','!!strip(disease)!!')');
 run; 

as you can see there is a lot to execute in macro analysis. I ran the code and I got the following error(20):

 

Re: Macro Error: More positional parameters found than defined.

 

also the running time its about 5 min which is too much. is there any way to do make this runing time lower?

 

Thank you for your help in advance. 

Samira

Reeza
Super User

When writing a macro, first make sure the code inside works. As written it shouldn't.

 

This is incorrect:


   data main_query;
   if "&regi"="STATE" then
   set withState;
   else set withoutState;
   run;

If you need conditional logic, then it's %if/%then/%else etc. 

 

In your macro call, via call execute, build the string first and then pass it to call execute. This will help you avoid errors.

 

You can use PROC STDIZE to set the missing to 0, rather than a data step, if its easier. It's dynamic so it's easier to modify if you have multiple years. Your IF/THEN in that step is also incorrect, because you're checking for missing with a blank - which means a character variable, but then you assign a numeric value instead of character value. Enclose the 0 in quotes. If you don't want PROC STDIZE, use an array instead. 

 

There are mistakes throughout so I go back to my first suggestion - make sure it's working without macro variables first, or hard coded macro variables. Then make it into a macro. Your base code will not be correct as shown.

samira
Obsidian | Level 7
thanks a lot for the time you spent. you are right. the error came from main_query
Reeza
Super User

Other common mistakes I see after another quick skim - all output tables will have the same name regardless of the run. This means you overwrite previous results if you run the macro more than once.

 

Also, like I first mentioned, you don't need a macro for this at all. It's likely making it harder to accomplish really. For academic reasons, learning, it may be worth fixing, but learning how to do it without macro's would be more useful in the long run. My 2 cents. 

 

 

samira
Obsidian | Level 7
Thanks KurtBremser for your nice answer. I just have another question. if in each iteration I want to store the result into a dataset (incrementally). how can I gather all the result in one dataset?
Kurt_Bremser
Super User

@samira wrote:
Thanks KurtBremser for your nice answer. I just have another question. if in each iteration I want to store the result into a dataset (incrementally). how can I gather all the result in one dataset?

You have this output statement:

output out=trig /* further options */;

Expand it with your macro parameters:

output out=trig_&region._&disease. /* further options */;

Now, to put everything into one dataset:

data _null_;
set control end=done;
if _n_ = 1
then do;
  call execute('data all_results; set');
end;
call execute('trig_'!!strip(region)!!'_'!!strip(disease)!!' ');
if done
then do;
  call execute(';run;');
end;
run;
samira
Obsidian | Level 7

Thanks a lot for the inforemation. I used this code to get the value of each iteration. I have tried to understand your code and utilize it in my code but I got some error which does not make sense to me. here is the first error: 

NOTE: CALL EXECUTE generated line.
1 + data all_results; set
NOTE: Line generated by the CALL EXECUTE routine.
2 + trig_CENTRAL_ADULT RESPIRATORY DISTRESS SYNDROME (ARDS)
____
22
ERROR 22-7: Invalid option name ARDS.
 
3 + trig_CENTRAL_ANAPLASMA PHAGOCYTOPHILUM
4 + trig_CENTRAL_ANIMAL BITES
5 + trig_CENTRAL_BOTULISM INFANT
6 + trig_CENTRAL_BRUCELLOSIS
7 + trig_CENTRAL_CAMPYLOBACTERIOSIS
8 + trig_CENTRAL_CHIKUNGUNYA
9 + trig_CENTRAL_COCCIDIOIDOMYCOSIS
NOTE: Line generated by the CALL EXECUTE routine.
10 + trig_CENTRAL_CREUTZFELDT-JAKOB DISEASE (CJD)
___
22
ERROR: Missing numeric suffix on a numbered data set list (WORK.trig_CENTRAL_CREUTZFELDT-WORK.JAKOB).
ERROR 22-7: Invalid option name CJD.
 
11 + trig_CENTRAL_CRYPTOSPORIDIOSIS
12 + trig_CENTRAL_CYCLOSPORIASIS
13 + trig_CENTRAL_DENGUE FEVER
14 + trig_CENTRAL_E. COLI O157 H7
ERROR: Libref 'trig_CENTRAL_E' exceeds 8 characters.
15 + trig_CENTRAL_EHRLICHIA CHAFFEENSIS
16 + trig_CENTRAL_EHRLICHIA EWINGII
17 + trig_CENTRAL_EHRLICHIOSIS ANAPLASMOSIS UNDETERMINED
18 + trig_CENTRAL_GIARDIASIS
NOTE: Line generated by the CALL EXECUTE routine.
19 + trig_CENTRAL_HAEMOPHILUS INFLUENZAE, INVASIVE
_
22
200
ERROR 22-322: Syntax error, expecting one of the following: a name, a quoted string, (, -, :, ;, CUROBS, END, INDSNAME, KEY,
KEYRESET, KEYS, NOBS, OPEN, POINT, _DATA_, _LAST_, _NULL_.
 
ERROR 200-322: The symbol is not recognized and will be ignored. 

and the following is my code which is edited based on your comments: 

 

%macro analysis(dise, regi);
 
  proc sql;
      create table withState as
      select week, year, sum(count) from data.datasource20012017filtered
      where disease="&dise"
      group by week, year
      union 
      select * from data.mtx3;
   quit;
 
 proc sql;
      create table withoutState as
      select week, year, count from data.datasource20012017filtered
      where region="&regi" and disease="&dise"
      union 
      select * from data.mtx3;
   quit;

   proc sql;
   create table sumup_count as 
   select distinct week, year , sum(count) as count
   from withoutState 
   group by week, year;
   quit;
   
   proc sort data=sumup_count;
   by week year; 
   run;
   
  
 *using the following proc transpose, we are gonna make the matrix;
   proc transpose data=sumup_count out=want (DROP=_LABEL_ AND _NAME_) prefix=year ;
   by week;
   id year;
   var count;
   run;

   data the_matrix_query_withnull;
   retain week year2001 year2002 year2003 year2004 year2005 year2006 year2007 year2008 year2009 year2010 year2011 year2012 year2013 year2014 year2015 year2016 year2017;
   set want;
   run;
  
    *the additional below data step is because we handel 
  the null and zero issue in the recomputation limits part. 
  in fact we need to put zero instead of null on missing value;
  data the_matrix_query;
   set the_matrix_query_withnull;
   if year2001=" " then year2001=0;
   if year2002=" " then year2002=0;
   if year2003=" " then year2003=0;
   if year2004=" " then year2004=0;
   if year2005=" " then year2005=0;
   if year2006=" " then year2006=0;
   if year2007=" " then year2007=0;
   if year2008=" " then year2008=0;
   if year2009=" " then year2009=0;
   if year2010=" " then year2010=0;
   if year2011=" " then year2011=0;
   if year2012=" " then year2012=0;
   if year2013=" " then year2013=0;
   if year2014=" " then year2014=0;
   if year2015=" " then year2015=0;
   if year2016=" " then year2016=0;
   if year2017=" " then year2017=0;
   run;

data statprocess;
	set the_matrix_query;
    avg=(sum(of year2007-year2016))/10;
 	x0=1;
 	x1=sin(2*3.14159*week/52);
	x2=cos(2*3.14159*week/52); 
 	x3=sin(4*3.14159*week/52); 
 	x4=cos(4*3.14159*week/52);
run;
   


  /* %if ( &regressionModel = 0 ) %then
   %do;*/
     * Zero order model u = a0 + a1*t;
      
      proc reg data=statprocess;
         model avg=x0;
         output out=trig_&regi._&dise. p=yhat r=e stdp=sd;
         
         run;
         

%mend analysis;
%analysis
 
 proc sql;
 create table control as
 select distinct region, disease from Report8;
 quit;
 
data _null_;
set control end=done;
if _n_ = 1
then do;
  call execute('data all_results; set');
end;
call execute('trig_'!!strip(region)!!'_'!!strip(disease)!!' ');
if done
then do;
  call execute(';run;');
end;
run;

appriciate all your helps in advance. 

Samira

 
samira
Obsidian | Level 7

Dear KurtBremser,

 

Over te last couple of days I was struggling to fix my code and get result based on your comments. your solution is intresting to me and I am sure this is the best way to do loop over list of disease name and regions. However I have still problem which I coudnt found out the reason over last 2 days. The problem is I couldnt get each loop's output and save into one data set. the below code gave me only one row. Please see my comments in the code. 

 

the following is my code after editing.

 

data compres;
set data.datasource20012017filtered;
disease=tranwrd(disease, ",", " ");
disease=tranwrd(disease, "<", " ");
disease=tranwrd(disease, "(", " ");
disease=tranwrd(disease, ")", " ");
disease=tranwrd(disease, ".", " ");
disease=tranwrd(disease, "-", " ");
disease=compress(disease);
disease=substr(disease,1,min(length(disease),16));
run;

proc sql;
 create table control as
 select distinct "EASTERN" as region, disease from Report8
 union select distinct "CENTRAL" as region, disease from Report8
  union select distinct "SOUTHWEST" as region, disease from Report8  
   union select distinct "SOUTHEAST" as region, disease from Report8  
    union select distinct "NORTHWEST" as region, disease from Report8;
 quit;
 
 data compres2;
 set control;
 disease=tranwrd(disease, ",", " ");
 disease=tranwrd(disease, "<", " ");
 disease=tranwrd(disease, "(", " ");
 disease=tranwrd(disease, ")", " ");
 disease=tranwrd(disease, ".", " ");
 disease=tranwrd(disease, "-", " ");
 disease=compress(disease);
 disease=substr(disease,1,min(length(disease),16));
 run;
 
%macro analysis(regi,dise,counter);
%put &regi;
%put &dise;
%put &counter;

 proc sql;
      create table withoutState as
      select week, year, count from compres
      where region="&regi" and disease="&dise"
      union
      select * from data.mtx3;
   quit;
 
   proc sql;
   create table sumup_count as 
   select distinct week, year , sum(count) as count
   from withoutState 
   group by week, year;
   quit;
   
   proc sort data=sumup_count;
   by week year; 
   run;
   
  
 *using the following proc transpose, we are gonna make the matrix;
   proc transpose data=sumup_count out=want (DROP=_LABEL_ AND _NAME_) prefix=year ;
   by week;
   id year;
   var count;
   run;

   data the_matrix_query_withnull;
   retain week year2001 year2002 year2003 year2004 year2005 year2006 year2007 year2008 year2009 year2010 year2011 year2012 year2013 year2014 year2015 year2016 year2017;
   set want;
   run;
  
    *the additional below data step is because we handel 
  the null and zero issue in the recomputation limits part. 
  in fact we need to put zero instead of null on missing value;
  data the_matrix_query;
   set the_matrix_query_withnull;
   if year2001=" " then year2001=0;
   if year2002=" " then year2002=0;
   if year2003=" " then year2003=0;
   if year2004=" " then year2004=0;
   if year2005=" " then year2005=0;
   if year2006=" " then year2006=0;
   if year2007=" " then year2007=0;
   if year2008=" " then year2008=0;
   if year2009=" " then year2009=0;
   if year2010=" " then year2010=0;
   if year2011=" " then year2011=0;
   if year2012=" " then year2012=0;
   if year2013=" " then year2013=0;
   if year2014=" " then year2014=0;
   if year2015=" " then year2015=0;
   if year2016=" " then year2016=0;
   if year2017=" " then year2017=0;
   run;

data statprocess;
	set the_matrix_query;
    avg=(sum(of year2007-year2016))/10;
 	x0=1;
 	x1=sin(2*3.14159*week/52);
	x2=cos(2*3.14159*week/52); 
 	x3=sin(4*3.14159*week/52); 
 	x4=cos(4*3.14159*week/52);
 	
run;

proc reg data=statprocess;
         model avg=x0;
         output out=trig_&regi._&dise. p=yhat r=e stdp=sd;
      run;
     

data _null_;
*set compres2 ; ---> here I removed setting onother data set because it goes through each row for each call execute analysis (there is a call execure outside to run the whole macro multiple times)
 %put &counter; 
if &counter= 1
then do;
  call execute('data all_results; set ');
end;
call execute('trig_'!!strip("&regi")!!'_'!!strip("&dise")!!'');
if &counter=73
then do;
%put "samira"; --> here I found out that "samira" repeated 73 times that mean something is wrong because 73 is the amount of rows that I have so it should just execute only one time.
call execute(';run;');
end;
run;

%mend;

 data _null_;
 set compres2;
 _n_=1;
 counter=_n_;
 call execute('%analysis('!!%str(strip(region))!!','!!%str(strip(disease))!!','!!strip(counter)!!');');
 run; 

proc print data=all_results(where=(week=3)); --> here I just got one row data. I expected to see all the outputs which come from each loop. 
run;

I really appriciate all your helps.

Samira

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 24 replies
  • 7171 views
  • 12 likes
  • 4 in conversation