DATA Step, Macro, Functions and more

Macro Loop for unknown Colums Count

Accepted Solution Solved
Reply
Super Contributor
Posts: 381
Accepted Solution

Macro Loop for unknown Colums Count

[ Edited ]

Hello, I need a Loop to calculate thousands PowerStat value for all "Data" columns in my code. Then i have to put all PowerStat values in new table like below. Couldyou help me , please ? 

 

.ABS.png

 

proc sort data=work.default;
by descending Data;
run;
data work.cumulative;
  set work.default;
  NonDefault=1-Default;
  by data notsorted;
  if first.default then defaultRate=0;
    defaultRate+default;
if first.default then NondefaultRate=0;
    NondefaultRate+Nondefault;
run;
proc sql;
create table work.cumulative2 as 
select default,data,NonDefault,defaultRate,NondefaultRate,sum(default) as TotalDefault,sum(NonDefault) as TotalNonDefault
from work.cumulative;
quit;
data PowerStat;
set work.cumulative2 END=lastN;
ResultD=defaultRate/TotalDefault;
ResultND=NondefaultRate/TotalNonDefault;
Retain Sonuc 0 MaxSonuc 0;
Area=(ResultND+Lag(ResultND))/2*(ResultD-Lag(ResultD));
IF missing(Area)=1 then Area2=0;
Else Area2=Area;
Sonuc=Sonuc+Area2;
if Sonuc>maxSOnuc then maxsonuc=sonuc;
if lastN then do;
PowerStat=(MaxSonuc-0.5)*2;
ABSResult=ABS(PowerStat);
output;
keep maxsonuc powerstat absresult;
end;
run;

 


Accepted Solutions
Solution
‎11-15-2015 09:58 AM
Super User
Super User
Posts: 6,502

Re: Macro Loop for unknown Colums Count

[ Edited ]

So here is a program that calculates your statistic.  Sorry if I messed up your carefully crafted caMeLcAse variable names. The results for these three order variables with this pattern of DEFAULT values is:

 

Data1= 0.61905

Data2= 0.26984
Data3= 0.01587

 

* Sample data with three order variables ;
data defaultCok;
	infile datalines truncover;
	input Default Data1-Data3;
	datalines;
0 4.15 5.15 4.00
1 0.03 5.50 3.50
0 1.53 1.70 3.05
1 0.32 0.47 2.50
1 0.03 0.90 1.70
0 0.61 0.00 3.70
0 0.78 0.80 2.70
1 0.00 2.90 4.10
0 0.70 2.09 0.00
1 0.09 3.00 2.90
1 0.00 4.00 0.00
1 2.81 4.02 0.10
0 0.47 5.00 2.90
1 1.92 4.00 4.50
1 0.01 3.50 3.00
0 0.69 2.10 1.70
;

* Convert to vertical to be able to process by variable ;
data vertical ;
   set defaultCok;
   array d data1-data3 ;
   length varname $32 ;
   do over d ;
     varname = vname(d);
     value = d ;
     output;
   end;
   keep default varname value ;
run;

* Order by descending value within each order variable ;
proc sort data=vertical;
  by varname descending value ;
run;

* Calculate PowerStat ;
data stats ;
  call missing(of totaldefault totalnondefault sonuc maxsonuc defaultrate nondefaultrate);
  do until (last.varname);
     set vertical ;
     by varname ;
     totaldefault = sum(totaldefault,default);
     totalnondefault = sum(totalnondefault,not default);
  end;
  do until (last.varname);
     set vertical ;
     by varname ;
     defaultRate=sum(defaultrate,default);
     nondefaultRate=sum(nondefaultrate,not default);
     ResultD=defaultRate/TotalDefault;
     ResultND=NondefaultRate/TotalNonDefault;
     if first.varname then area=0;
     else Area=(ResultND+LagResultND)/2*(ResultD-LagResultD) ;
     Sonuc=sum(Sonuc,Area);
     maxsonuc = max(maxsonuc,sonuc);
     lagResultD = ResultD ;
     lagResultND = ResultND ;
   end;
   PowerStat=(MaxSonuc-0.5)*2;
   ABSResult=ABS(PowerStat);
   keep varname absresult ;
run;
proc print; run;

 

View solution in original post


All Replies
Super User
Posts: 10,511

Re: Macro Loop for unknown Colums Count

I think you need to tell us something about the data you start with. Hard to tell how to get the end without a start.

Super Contributor
Posts: 381

Re: Macro Loop for unknown Colums Count

Sorry for missing information. Here is my datalines and my code. I have thousands "Data" columns. Thanks  to @Reeza i can calcuate  ABSPowerStat for first Data column but i tried some macros to calculate for all "Data" columns  but i failed. I'm new on macro language. Could you help me ,please ?

ABS.png

Data0005     .......

Data0006      .........

Data0007    ...........


data defaultCok; length default 8. Data 8. Data2 8. Data3 8. ; infile datalines missover; input Default Data Data2 Data3; datalines; 0 4.15 5.15 4.00 1 0.03 5.50 3.50 0 1.53 1.70 3.05 1 0.32 0.47 2.50 1 0.03 0.90 1.70 0 0.61 0.00 3.70 0 0.78 0.80 2.70 1 0.00 2.90 4.10 0 0.70 2.09 0.00 1 0.09 3.00 2.90 1 0.00 4.00 0.00 1 2.81 4.02 0.10 0 0.47 5.00 2.90 1 1.92 4.00 4.50 1 0.01 3.50 3.00 0 0.69 2.10 1.70 ; proc sort data=work.defaultCok; by descending Data; run; data work.cumulative; set work.defaultCok; NonDefault=1-Default; by data notsorted; if first.default then defaultRate=0; defaultRate+default; if first.default then NondefaultRate=0; NondefaultRate+Nondefault; run; /*proc sql; create table yeni as select data,(select SUM(default) from work.default as def2 where def2.data <= def1.data) as defaultRate from work.default def1 order by data desc;*/ proc sql; create table work.cumulative2 as select default,data,NonDefault,defaultRate,NondefaultRate,sum(default) as TotalDefault,sum(NonDefault) as TotalNonDefault from work.cumulative; quit; data PowerStat; set work.cumulative2 END=lastN; ResultD=defaultRate/TotalDefault; ResultND=NondefaultRate/TotalNonDefault; Retain Sonuc 0 MaxSonuc 0; Area=(ResultND+Lag(ResultND))/2*(ResultD-Lag(ResultD)); IF missing(Area)=1 then Area2=0; Else Area2=Area; Sonuc=Sonuc+Area2; if Sonuc>maxSOnuc then maxsonuc=sonuc; if lastN then do; PowerStat=(MaxSonuc-0.5)*2; ABSResult=ABS(PowerStat); output; keep absresult; end; run;

 

 

Super User
Super User
Posts: 6,502

Re: Macro Loop for unknown Colums Count

You have two issue here and to properly fix this you need to address both.

 

The first problem is that your program does not work.  Your data step immediately after the PROC SORT is referencing FIRST.DEFAULT and does not include a BY DEFAULT statement.

Perhaps if you could explain what the code is trying to do it could be re-written to do it in a way that would be easier to extend to multple variables?

 

There are a few ways that immediately come to mind to applying calculations to multiple variables.

1) You could create arrays of an the loop over the array.  This program does not look like it would work for that, but perhaps the code could be re-written so that it could.

2) You could structure the data so that you could use BY variable processing.  In this case you could convert from three columns (DATA, DATA2, DATA3) to three rows with two columns VARNAME and VALUE and then process the VALUE column as you are currently processing the DATA column, but add VARNAME as a BY group.  PROC TRANSPOSE would do that easily.

3) As a last resort create a macro that takes the variable name as in input parameter.

 

To make a macro you need to decide what is that varies between diffferent calls to macro. That is what are the parameters.  I think that you want the name of the variable as the parameter.  So you would call it first with DATA then DATA2 then DATA3 etc.

 

 

Super Contributor
Posts: 381

Re: Macro Loop for unknown Colums Count

Hello Tom.

"The first problem is that your program does not work.  Your data step immediately after the PROC SORT is referencing FIRST.DEFAULT and does not include a BY DEFAULT statement." I think my porgram works and DEFAULT is my constant column and in this step 

 

if first.default then defaultRate=0;
    defaultRate+default;
if first.default then NondefaultRate=0;
    NondefaultRate+Nondefault;

I got column's cumulative values. Did i make mistake ?

 

I am trying do,

1- Take cumulative values of "default" column

2-Convert "default1 column's values

3-Take cumulative values of "Nondefault"column

4- I need sum of "default" and "Nondefault" columns

5-defaultCumulative/sumDefault->ResultD

  NondefaulCumulative/sumNonDefault->ResultND

6 Then i perform the PowerStat formula in the Powerstat Data step

 

Super User
Super User
Posts: 6,502

Re: Macro Loop for unknown Colums Count

If you want to use FIRST.DEFAULT then the variable DEFAULT needs to be in the BY statement. Otherwise you will get a warning from SAS that FIRST.DEFAULT is uniniatilzed and FIRST.DEFAULT will never be true.

 

So what is meaning of the variable DEFAULTRATE that you are trying to calculate?  Are just making a running count of the number of times that DEFAULT=1?  And NONDEFAULTRATE is a running count of number of times that DEFAULT=0?

 

If so then it would clearer and easier to skip the IF/THEN and FIRST. references and just code:

DEFAULTRATE + DEFAULT;

NONDEFAULTRATE + (NOT DEFAULT);

 

What is the PROC SQL step doing? It looks like it is just merging on the sum(), but perhaps there is something else happening there?

 

What is the meaning of POWERSTAT?  What does it mean in words? Is it some type of Power Statistic? If so which one? What formula are you using?  Do you have a reference?  Is it possible that it is a standard statistic that an existing SAS PROC can calculate for you? That  would be a lot easier than writing your own formula.

 

Super User
Posts: 17,842

Re: Macro Loop for unknown Colums Count

Assuming you want to do this over many different datasets:
1. Create a macro that calculates the output and appends it to a master table
%powerstat(datain=, var=, dataout=);
2. Create a dataset that is a list of all datasets that need to be processed with the variable that needs to be executed.
3. Use Call execute to call macro from #1
SAS Employee
Posts: 12

Re: Macro Loop for unknown Colums Count

I'm assuming the vars to be analyzed are Data1-Datax (you had Data Data2.)

 

With that assumption the following code will do what you need, I beieve.

 

data results;
  length ColumnName $32 PowerStat 8 ;
  stop;
run;
%macro iterate(dataCols=1);
%do dataCol = 1 %to &dataCols.;
proc sort data=work.defaultCok(rename=(data&DataCol.=data)) out=work.defaultCok2;
  by descending Data;
run;
data work.cumulative;
  set work.defaultCok2;
  NonDefault=1-Default;
  by data notsorted;
  if first.default then defaultRate=0;
    defaultRate+default;
if first.default then NondefaultRate=0;
    NondefaultRate+Nondefault;
run;

proc sql;
create table work.cumulative2 as 
select default,data,NonDefault,defaultRate,NondefaultRate,sum(default) as TotalDefault,sum(NonDefault) as TotalNonDefault
from work.cumulative;
quit;
data PowerStat;
set work.cumulative2 END=lastN;
ResultD=defaultRate/TotalDefault;
ResultND=NondefaultRate/TotalNonDefault;
Retain Sonuc 0 MaxSonuc 0;
Area=(ResultND+Lag(ResultND))/2*(ResultD-Lag(ResultD));
IF missing(Area)=1 then Area2=0;
Else Area2=Area;
Sonuc=Sonuc+Area2;
if Sonuc>maxSOnuc then maxsonuc=sonuc;
if lastN then do;
PowerStat=(MaxSonuc-0.5)*2;
ABSResult=ABS(PowerStat);
varName="data&datacol.";
output;
keep absresult varName;
end;
run;

data results;
  set results Powerstat(rename=(absresult=Powerstat varName=ColumnName));
run;
%end;
%mend;
%iterate(dataCols=3) * number of DataX columns ;
Super Contributor
Posts: 381

Re: Macro Loop for unknown Colums Count

Thank you Chris. I take the output but also it gives error like below.The other problem is, it takes so long time. Maybe i have to change the code or need to get  more efficent macro. 

 

ERROR: Variable DATA1 not found.

WARNING: The data set WORK.DEFAULTCOK2 may be incomplete. When this step was stopped there were 0 observations and 0 variables.
WARNING: Data set WORK.DEFAULTCOK2 was not replaced because this step was stopped.

 

Thank you.

Solution
‎11-15-2015 09:58 AM
Super User
Super User
Posts: 6,502

Re: Macro Loop for unknown Colums Count

[ Edited ]

So here is a program that calculates your statistic.  Sorry if I messed up your carefully crafted caMeLcAse variable names. The results for these three order variables with this pattern of DEFAULT values is:

 

Data1= 0.61905

Data2= 0.26984
Data3= 0.01587

 

* Sample data with three order variables ;
data defaultCok;
	infile datalines truncover;
	input Default Data1-Data3;
	datalines;
0 4.15 5.15 4.00
1 0.03 5.50 3.50
0 1.53 1.70 3.05
1 0.32 0.47 2.50
1 0.03 0.90 1.70
0 0.61 0.00 3.70
0 0.78 0.80 2.70
1 0.00 2.90 4.10
0 0.70 2.09 0.00
1 0.09 3.00 2.90
1 0.00 4.00 0.00
1 2.81 4.02 0.10
0 0.47 5.00 2.90
1 1.92 4.00 4.50
1 0.01 3.50 3.00
0 0.69 2.10 1.70
;

* Convert to vertical to be able to process by variable ;
data vertical ;
   set defaultCok;
   array d data1-data3 ;
   length varname $32 ;
   do over d ;
     varname = vname(d);
     value = d ;
     output;
   end;
   keep default varname value ;
run;

* Order by descending value within each order variable ;
proc sort data=vertical;
  by varname descending value ;
run;

* Calculate PowerStat ;
data stats ;
  call missing(of totaldefault totalnondefault sonuc maxsonuc defaultrate nondefaultrate);
  do until (last.varname);
     set vertical ;
     by varname ;
     totaldefault = sum(totaldefault,default);
     totalnondefault = sum(totalnondefault,not default);
  end;
  do until (last.varname);
     set vertical ;
     by varname ;
     defaultRate=sum(defaultrate,default);
     nondefaultRate=sum(nondefaultrate,not default);
     ResultD=defaultRate/TotalDefault;
     ResultND=NondefaultRate/TotalNonDefault;
     if first.varname then area=0;
     else Area=(ResultND+LagResultND)/2*(ResultD-LagResultD) ;
     Sonuc=sum(Sonuc,Area);
     maxsonuc = max(maxsonuc,sonuc);
     lagResultD = ResultD ;
     lagResultND = ResultND ;
   end;
   PowerStat=(MaxSonuc-0.5)*2;
   ABSResult=ABS(PowerStat);
   keep varname absresult ;
run;
proc print; run;

 

Super Contributor
Posts: 381

Re: Macro Loop for unknown Colums Count

Thank you. It seems to be working. But i changed the column "Data" (for all of them) as "Rasyo" then i put a simple macro now all AbsResult is gotten same. I can't any warning in the log tab. Have you got any idea ?

 

%let colCnt=4596;
data vertical ;
   set defaultCok;
   array d Q0001-Q&colCnt.;
   length Rasyo $32 ;
   do over d ;
     Rasyo = vname(d);
     value = d ;
     output;
   end;
   keep default Rasyo value ;
run;

 

CanSas.PNG

Super User
Super User
Posts: 6,502

Re: Macro Loop for unknown Colums Count

Hard to say since you did not post any error messages.  If you change the variable name from VARNAME to RASYO in this step make sure to change it in the later steps also.

Are you asking why the caclculated numbers are the same?  Do they produce the same ordering of values of DEFAULT?  From your algorithm the result is totally based on the order of 1's and 0's in the variable DEFAULT to result when they are sorted by the value of the "RASYO" variable.

Super Contributor
Posts: 381

Re: Macro Loop for unknown Colums Count

I just got this warning message in the Log summary;

 

WARNING: The variable default in the DROP, KEEP, or RENAME list has never been referenced.

 

Thank you.

 

Super User
Posts: 9,682

Re: Macro Loop for unknown Colums Count

What formula are you using to calculate PowerStat ? Maybe IML code can make it more succinct , better and faster .
Super Contributor
Posts: 381

Re: Macro Loop for unknown Colums Count

[ Edited ]

I think PowerStat same with GINI but i'm not sure. Also i don't know how can i perform the GINI function on this code .

 

This warning message was occured because i forgot the change name of the variable default.(WARNING: The variable default in the DROP, KEEP, or RENAME list has never been referenced.).

But there is a problem here again i'll check again thanks everyone

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 15 replies
  • 376 views
  • 1 like
  • 6 in conversation