Hello, I need a Loop to calculate thousands PowerStat value for all "Data" columns in my code. Then i have to put all PowerStat values in new table like below. Couldyou help me , please ?
.
proc sort data=work.default;
by descending Data;
run;
data work.cumulative;
set work.default;
NonDefault=1-Default;
by data notsorted;
if first.default then defaultRate=0;
defaultRate+default;
if first.default then NondefaultRate=0;
NondefaultRate+Nondefault;
run;
proc sql;
create table work.cumulative2 as
select default,data,NonDefault,defaultRate,NondefaultRate,sum(default) as TotalDefault,sum(NonDefault) as TotalNonDefault
from work.cumulative;
quit;
data PowerStat;
set work.cumulative2 END=lastN;
ResultD=defaultRate/TotalDefault;
ResultND=NondefaultRate/TotalNonDefault;
Retain Sonuc 0 MaxSonuc 0;
Area=(ResultND+Lag(ResultND))/2*(ResultD-Lag(ResultD));
IF missing(Area)=1 then Area2=0;
Else Area2=Area;
Sonuc=Sonuc+Area2;
if Sonuc>maxSOnuc then maxsonuc=sonuc;
if lastN then do;
PowerStat=(MaxSonuc-0.5)*2;
ABSResult=ABS(PowerStat);
output;
keep maxsonuc powerstat absresult;
end;
run;
So here is a program that calculates your statistic. Sorry if I messed up your carefully crafted caMeLcAse variable names. The results for these three order variables with this pattern of DEFAULT values is:
Data1= 0.61905
Data2= 0.26984
Data3= 0.01587
* Sample data with three order variables ;
data defaultCok;
infile datalines truncover;
input Default Data1-Data3;
datalines;
0 4.15 5.15 4.00
1 0.03 5.50 3.50
0 1.53 1.70 3.05
1 0.32 0.47 2.50
1 0.03 0.90 1.70
0 0.61 0.00 3.70
0 0.78 0.80 2.70
1 0.00 2.90 4.10
0 0.70 2.09 0.00
1 0.09 3.00 2.90
1 0.00 4.00 0.00
1 2.81 4.02 0.10
0 0.47 5.00 2.90
1 1.92 4.00 4.50
1 0.01 3.50 3.00
0 0.69 2.10 1.70
;
* Convert to vertical to be able to process by variable ;
data vertical ;
set defaultCok;
array d data1-data3 ;
length varname $32 ;
do over d ;
varname = vname(d);
value = d ;
output;
end;
keep default varname value ;
run;
* Order by descending value within each order variable ;
proc sort data=vertical;
by varname descending value ;
run;
* Calculate PowerStat ;
data stats ;
call missing(of totaldefault totalnondefault sonuc maxsonuc defaultrate nondefaultrate);
do until (last.varname);
set vertical ;
by varname ;
totaldefault = sum(totaldefault,default);
totalnondefault = sum(totalnondefault,not default);
end;
do until (last.varname);
set vertical ;
by varname ;
defaultRate=sum(defaultrate,default);
nondefaultRate=sum(nondefaultrate,not default);
ResultD=defaultRate/TotalDefault;
ResultND=NondefaultRate/TotalNonDefault;
if first.varname then area=0;
else Area=(ResultND+LagResultND)/2*(ResultD-LagResultD) ;
Sonuc=sum(Sonuc,Area);
maxsonuc = max(maxsonuc,sonuc);
lagResultD = ResultD ;
lagResultND = ResultND ;
end;
PowerStat=(MaxSonuc-0.5)*2;
ABSResult=ABS(PowerStat);
keep varname absresult ;
run;
proc print; run;
I think you need to tell us something about the data you start with. Hard to tell how to get the end without a start.
Sorry for missing information. Here is my datalines and my code. I have thousands "Data" columns. Thanks to @Reeza i can calcuate ABSPowerStat for first Data column but i tried some macros to calculate for all "Data" columns but i failed. I'm new on macro language. Could you help me ,please ?
Data0005 .......
Data0006 .........
Data0007 ...........
data defaultCok;
length default 8. Data 8. Data2 8. Data3 8. ;
infile datalines missover;
input Default Data Data2 Data3;
datalines;
0 4.15 5.15 4.00
1 0.03 5.50 3.50
0 1.53 1.70 3.05
1 0.32 0.47 2.50
1 0.03 0.90 1.70
0 0.61 0.00 3.70
0 0.78 0.80 2.70
1 0.00 2.90 4.10
0 0.70 2.09 0.00
1 0.09 3.00 2.90
1 0.00 4.00 0.00
1 2.81 4.02 0.10
0 0.47 5.00 2.90
1 1.92 4.00 4.50
1 0.01 3.50 3.00
0 0.69 2.10 1.70
;
proc sort data=work.defaultCok;
by descending Data;
run;
data work.cumulative;
set work.defaultCok;
NonDefault=1-Default;
by data notsorted;
if first.default then defaultRate=0;
defaultRate+default;
if first.default then NondefaultRate=0;
NondefaultRate+Nondefault;
run;
/*proc sql;
create table yeni as
select data,(select SUM(default) from work.default as def2 where def2.data <= def1.data) as defaultRate
from work.default def1
order by data desc;*/
proc sql;
create table work.cumulative2 as
select default,data,NonDefault,defaultRate,NondefaultRate,sum(default) as TotalDefault,sum(NonDefault) as TotalNonDefault
from work.cumulative;
quit;
data PowerStat;
set work.cumulative2 END=lastN;
ResultD=defaultRate/TotalDefault;
ResultND=NondefaultRate/TotalNonDefault;
Retain Sonuc 0 MaxSonuc 0;
Area=(ResultND+Lag(ResultND))/2*(ResultD-Lag(ResultD));
IF missing(Area)=1 then Area2=0;
Else Area2=Area;
Sonuc=Sonuc+Area2;
if Sonuc>maxSOnuc then maxsonuc=sonuc;
if lastN then do;
PowerStat=(MaxSonuc-0.5)*2;
ABSResult=ABS(PowerStat);
output;
keep absresult;
end;
run;
You have two issue here and to properly fix this you need to address both.
The first problem is that your program does not work. Your data step immediately after the PROC SORT is referencing FIRST.DEFAULT and does not include a BY DEFAULT statement.
Perhaps if you could explain what the code is trying to do it could be re-written to do it in a way that would be easier to extend to multple variables?
There are a few ways that immediately come to mind to applying calculations to multiple variables.
1) You could create arrays of an the loop over the array. This program does not look like it would work for that, but perhaps the code could be re-written so that it could.
2) You could structure the data so that you could use BY variable processing. In this case you could convert from three columns (DATA, DATA2, DATA3) to three rows with two columns VARNAME and VALUE and then process the VALUE column as you are currently processing the DATA column, but add VARNAME as a BY group. PROC TRANSPOSE would do that easily.
3) As a last resort create a macro that takes the variable name as in input parameter.
To make a macro you need to decide what is that varies between diffferent calls to macro. That is what are the parameters. I think that you want the name of the variable as the parameter. So you would call it first with DATA then DATA2 then DATA3 etc.
Hello Tom.
"The first problem is that your program does not work. Your data step immediately after the PROC SORT is referencing FIRST.DEFAULT and does not include a BY DEFAULT statement." I think my porgram works and DEFAULT is my constant column and in this step
if first.default then defaultRate=0;
defaultRate+default;
if first.default then NondefaultRate=0;
NondefaultRate+Nondefault;
I got column's cumulative values. Did i make mistake ?
I am trying do,
1- Take cumulative values of "default" column
2-Convert "default1 column's values
3-Take cumulative values of "Nondefault"column
4- I need sum of "default" and "Nondefault" columns
5-defaultCumulative/sumDefault->ResultD
NondefaulCumulative/sumNonDefault->ResultND
6 Then i perform the PowerStat formula in the Powerstat Data step
If you want to use FIRST.DEFAULT then the variable DEFAULT needs to be in the BY statement. Otherwise you will get a warning from SAS that FIRST.DEFAULT is uniniatilzed and FIRST.DEFAULT will never be true.
So what is meaning of the variable DEFAULTRATE that you are trying to calculate? Are just making a running count of the number of times that DEFAULT=1? And NONDEFAULTRATE is a running count of number of times that DEFAULT=0?
If so then it would clearer and easier to skip the IF/THEN and FIRST. references and just code:
DEFAULTRATE + DEFAULT;
NONDEFAULTRATE + (NOT DEFAULT);
What is the PROC SQL step doing? It looks like it is just merging on the sum(), but perhaps there is something else happening there?
What is the meaning of POWERSTAT? What does it mean in words? Is it some type of Power Statistic? If so which one? What formula are you using? Do you have a reference? Is it possible that it is a standard statistic that an existing SAS PROC can calculate for you? That would be a lot easier than writing your own formula.
I'm assuming the vars to be analyzed are Data1-Datax (you had Data Data2.)
With that assumption the following code will do what you need, I beieve.
data results;
length ColumnName $32 PowerStat 8 ;
stop;
run;
%macro iterate(dataCols=1);
%do dataCol = 1 %to &dataCols.;
proc sort data=work.defaultCok(rename=(data&DataCol.=data)) out=work.defaultCok2;
by descending Data;
run;
data work.cumulative;
set work.defaultCok2;
NonDefault=1-Default;
by data notsorted;
if first.default then defaultRate=0;
defaultRate+default;
if first.default then NondefaultRate=0;
NondefaultRate+Nondefault;
run;
proc sql;
create table work.cumulative2 as
select default,data,NonDefault,defaultRate,NondefaultRate,sum(default) as TotalDefault,sum(NonDefault) as TotalNonDefault
from work.cumulative;
quit;
data PowerStat;
set work.cumulative2 END=lastN;
ResultD=defaultRate/TotalDefault;
ResultND=NondefaultRate/TotalNonDefault;
Retain Sonuc 0 MaxSonuc 0;
Area=(ResultND+Lag(ResultND))/2*(ResultD-Lag(ResultD));
IF missing(Area)=1 then Area2=0;
Else Area2=Area;
Sonuc=Sonuc+Area2;
if Sonuc>maxSOnuc then maxsonuc=sonuc;
if lastN then do;
PowerStat=(MaxSonuc-0.5)*2;
ABSResult=ABS(PowerStat);
varName="data&datacol.";
output;
keep absresult varName;
end;
run;
data results;
set results Powerstat(rename=(absresult=Powerstat varName=ColumnName));
run;
%end;
%mend;
%iterate(dataCols=3) * number of DataX columns ;
Thank you Chris. I take the output but also it gives error like below.The other problem is, it takes so long time. Maybe i have to change the code or need to get more efficent macro.
ERROR: Variable DATA1 not found.
WARNING: The data set WORK.DEFAULTCOK2 may be incomplete. When this step was stopped there were 0 observations and 0 variables.
WARNING: Data set WORK.DEFAULTCOK2 was not replaced because this step was stopped.
Thank you.
So here is a program that calculates your statistic. Sorry if I messed up your carefully crafted caMeLcAse variable names. The results for these three order variables with this pattern of DEFAULT values is:
Data1= 0.61905
Data2= 0.26984
Data3= 0.01587
* Sample data with three order variables ;
data defaultCok;
infile datalines truncover;
input Default Data1-Data3;
datalines;
0 4.15 5.15 4.00
1 0.03 5.50 3.50
0 1.53 1.70 3.05
1 0.32 0.47 2.50
1 0.03 0.90 1.70
0 0.61 0.00 3.70
0 0.78 0.80 2.70
1 0.00 2.90 4.10
0 0.70 2.09 0.00
1 0.09 3.00 2.90
1 0.00 4.00 0.00
1 2.81 4.02 0.10
0 0.47 5.00 2.90
1 1.92 4.00 4.50
1 0.01 3.50 3.00
0 0.69 2.10 1.70
;
* Convert to vertical to be able to process by variable ;
data vertical ;
set defaultCok;
array d data1-data3 ;
length varname $32 ;
do over d ;
varname = vname(d);
value = d ;
output;
end;
keep default varname value ;
run;
* Order by descending value within each order variable ;
proc sort data=vertical;
by varname descending value ;
run;
* Calculate PowerStat ;
data stats ;
call missing(of totaldefault totalnondefault sonuc maxsonuc defaultrate nondefaultrate);
do until (last.varname);
set vertical ;
by varname ;
totaldefault = sum(totaldefault,default);
totalnondefault = sum(totalnondefault,not default);
end;
do until (last.varname);
set vertical ;
by varname ;
defaultRate=sum(defaultrate,default);
nondefaultRate=sum(nondefaultrate,not default);
ResultD=defaultRate/TotalDefault;
ResultND=NondefaultRate/TotalNonDefault;
if first.varname then area=0;
else Area=(ResultND+LagResultND)/2*(ResultD-LagResultD) ;
Sonuc=sum(Sonuc,Area);
maxsonuc = max(maxsonuc,sonuc);
lagResultD = ResultD ;
lagResultND = ResultND ;
end;
PowerStat=(MaxSonuc-0.5)*2;
ABSResult=ABS(PowerStat);
keep varname absresult ;
run;
proc print; run;
Thank you. It seems to be working. But i changed the column "Data" (for all of them) as "Rasyo" then i put a simple macro now all AbsResult is gotten same. I can't any warning in the log tab. Have you got any idea ?
%let colCnt=4596;
data vertical ;
set defaultCok;
array d Q0001-Q&colCnt.;
length Rasyo $32 ;
do over d ;
Rasyo = vname(d);
value = d ;
output;
end;
keep default Rasyo value ;
run;
Hard to say since you did not post any error messages. If you change the variable name from VARNAME to RASYO in this step make sure to change it in the later steps also.
Are you asking why the caclculated numbers are the same? Do they produce the same ordering of values of DEFAULT? From your algorithm the result is totally based on the order of 1's and 0's in the variable DEFAULT to result when they are sorted by the value of the "RASYO" variable.
I just got this warning message in the Log summary;
WARNING: The variable default in the DROP, KEEP, or RENAME list has never been referenced.
Thank you.
I think PowerStat same with GINI but i'm not sure. Also i don't know how can i perform the GINI function on this code .
This warning message was occured because i forgot the change name of the variable default.(WARNING: The variable default in the DROP, KEEP, or RENAME list has never been referenced.).
But there is a problem here again i'll check again thanks everyone
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.