BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
turcay
Lapis Lazuli | Level 10

Hello, I need a Loop to calculate thousands PowerStat value for all "Data" columns in my code. Then i have to put all PowerStat values in new table like below. Couldyou help me , please ? 

 

.ABS.png

 

proc sort data=work.default;
by descending Data;
run;
data work.cumulative;
  set work.default;
  NonDefault=1-Default;
  by data notsorted;
  if first.default then defaultRate=0;
    defaultRate+default;
if first.default then NondefaultRate=0;
    NondefaultRate+Nondefault;
run;
proc sql;
create table work.cumulative2 as 
select default,data,NonDefault,defaultRate,NondefaultRate,sum(default) as TotalDefault,sum(NonDefault) as TotalNonDefault
from work.cumulative;
quit;
data PowerStat;
set work.cumulative2 END=lastN;
ResultD=defaultRate/TotalDefault;
ResultND=NondefaultRate/TotalNonDefault;
Retain Sonuc 0 MaxSonuc 0;
Area=(ResultND+Lag(ResultND))/2*(ResultD-Lag(ResultD));
IF missing(Area)=1 then Area2=0;
Else Area2=Area;
Sonuc=Sonuc+Area2;
if Sonuc>maxSOnuc then maxsonuc=sonuc;
if lastN then do;
PowerStat=(MaxSonuc-0.5)*2;
ABSResult=ABS(PowerStat);
output;
keep maxsonuc powerstat absresult;
end;
run;

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

So here is a program that calculates your statistic.  Sorry if I messed up your carefully crafted caMeLcAse variable names. The results for these three order variables with this pattern of DEFAULT values is:

 

Data1= 0.61905

Data2= 0.26984
Data3= 0.01587

 

* Sample data with three order variables ;
data defaultCok;
	infile datalines truncover;
	input Default Data1-Data3;
	datalines;
0 4.15 5.15 4.00
1 0.03 5.50 3.50
0 1.53 1.70 3.05
1 0.32 0.47 2.50
1 0.03 0.90 1.70
0 0.61 0.00 3.70
0 0.78 0.80 2.70
1 0.00 2.90 4.10
0 0.70 2.09 0.00
1 0.09 3.00 2.90
1 0.00 4.00 0.00
1 2.81 4.02 0.10
0 0.47 5.00 2.90
1 1.92 4.00 4.50
1 0.01 3.50 3.00
0 0.69 2.10 1.70
;

* Convert to vertical to be able to process by variable ;
data vertical ;
   set defaultCok;
   array d data1-data3 ;
   length varname $32 ;
   do over d ;
     varname = vname(d);
     value = d ;
     output;
   end;
   keep default varname value ;
run;

* Order by descending value within each order variable ;
proc sort data=vertical;
  by varname descending value ;
run;

* Calculate PowerStat ;
data stats ;
  call missing(of totaldefault totalnondefault sonuc maxsonuc defaultrate nondefaultrate);
  do until (last.varname);
     set vertical ;
     by varname ;
     totaldefault = sum(totaldefault,default);
     totalnondefault = sum(totalnondefault,not default);
  end;
  do until (last.varname);
     set vertical ;
     by varname ;
     defaultRate=sum(defaultrate,default);
     nondefaultRate=sum(nondefaultrate,not default);
     ResultD=defaultRate/TotalDefault;
     ResultND=NondefaultRate/TotalNonDefault;
     if first.varname then area=0;
     else Area=(ResultND+LagResultND)/2*(ResultD-LagResultD) ;
     Sonuc=sum(Sonuc,Area);
     maxsonuc = max(maxsonuc,sonuc);
     lagResultD = ResultD ;
     lagResultND = ResultND ;
   end;
   PowerStat=(MaxSonuc-0.5)*2;
   ABSResult=ABS(PowerStat);
   keep varname absresult ;
run;
proc print; run;

 

View solution in original post

15 REPLIES 15
ballardw
Super User

I think you need to tell us something about the data you start with. Hard to tell how to get the end without a start.

turcay
Lapis Lazuli | Level 10

Sorry for missing information. Here is my datalines and my code. I have thousands "Data" columns. Thanks  to @Reeza i can calcuate  ABSPowerStat for first Data column but i tried some macros to calculate for all "Data" columns  but i failed. I'm new on macro language. Could you help me ,please ?

ABS.png

Data0005     .......

Data0006      .........

Data0007    ...........


data defaultCok; length default 8. Data 8. Data2 8. Data3 8. ; infile datalines missover; input Default Data Data2 Data3; datalines; 0 4.15 5.15 4.00 1 0.03 5.50 3.50 0 1.53 1.70 3.05 1 0.32 0.47 2.50 1 0.03 0.90 1.70 0 0.61 0.00 3.70 0 0.78 0.80 2.70 1 0.00 2.90 4.10 0 0.70 2.09 0.00 1 0.09 3.00 2.90 1 0.00 4.00 0.00 1 2.81 4.02 0.10 0 0.47 5.00 2.90 1 1.92 4.00 4.50 1 0.01 3.50 3.00 0 0.69 2.10 1.70 ; proc sort data=work.defaultCok; by descending Data; run; data work.cumulative; set work.defaultCok; NonDefault=1-Default; by data notsorted; if first.default then defaultRate=0; defaultRate+default; if first.default then NondefaultRate=0; NondefaultRate+Nondefault; run; /*proc sql; create table yeni as select data,(select SUM(default) from work.default as def2 where def2.data <= def1.data) as defaultRate from work.default def1 order by data desc;*/ proc sql; create table work.cumulative2 as select default,data,NonDefault,defaultRate,NondefaultRate,sum(default) as TotalDefault,sum(NonDefault) as TotalNonDefault from work.cumulative; quit; data PowerStat; set work.cumulative2 END=lastN; ResultD=defaultRate/TotalDefault; ResultND=NondefaultRate/TotalNonDefault; Retain Sonuc 0 MaxSonuc 0; Area=(ResultND+Lag(ResultND))/2*(ResultD-Lag(ResultD)); IF missing(Area)=1 then Area2=0; Else Area2=Area; Sonuc=Sonuc+Area2; if Sonuc>maxSOnuc then maxsonuc=sonuc; if lastN then do; PowerStat=(MaxSonuc-0.5)*2; ABSResult=ABS(PowerStat); output; keep absresult; end; run;

 

 

Tom
Super User Tom
Super User

You have two issue here and to properly fix this you need to address both.

 

The first problem is that your program does not work.  Your data step immediately after the PROC SORT is referencing FIRST.DEFAULT and does not include a BY DEFAULT statement.

Perhaps if you could explain what the code is trying to do it could be re-written to do it in a way that would be easier to extend to multple variables?

 

There are a few ways that immediately come to mind to applying calculations to multiple variables.

1) You could create arrays of an the loop over the array.  This program does not look like it would work for that, but perhaps the code could be re-written so that it could.

2) You could structure the data so that you could use BY variable processing.  In this case you could convert from three columns (DATA, DATA2, DATA3) to three rows with two columns VARNAME and VALUE and then process the VALUE column as you are currently processing the DATA column, but add VARNAME as a BY group.  PROC TRANSPOSE would do that easily.

3) As a last resort create a macro that takes the variable name as in input parameter.

 

To make a macro you need to decide what is that varies between diffferent calls to macro. That is what are the parameters.  I think that you want the name of the variable as the parameter.  So you would call it first with DATA then DATA2 then DATA3 etc.

 

 

turcay
Lapis Lazuli | Level 10

Hello Tom.

"The first problem is that your program does not work.  Your data step immediately after the PROC SORT is referencing FIRST.DEFAULT and does not include a BY DEFAULT statement." I think my porgram works and DEFAULT is my constant column and in this step 

 

if first.default then defaultRate=0;
    defaultRate+default;
if first.default then NondefaultRate=0;
    NondefaultRate+Nondefault;

I got column's cumulative values. Did i make mistake ?

 

I am trying do,

1- Take cumulative values of "default" column

2-Convert "default1 column's values

3-Take cumulative values of "Nondefault"column

4- I need sum of "default" and "Nondefault" columns

5-defaultCumulative/sumDefault->ResultD

  NondefaulCumulative/sumNonDefault->ResultND

6 Then i perform the PowerStat formula in the Powerstat Data step

 

Tom
Super User Tom
Super User

If you want to use FIRST.DEFAULT then the variable DEFAULT needs to be in the BY statement. Otherwise you will get a warning from SAS that FIRST.DEFAULT is uniniatilzed and FIRST.DEFAULT will never be true.

 

So what is meaning of the variable DEFAULTRATE that you are trying to calculate?  Are just making a running count of the number of times that DEFAULT=1?  And NONDEFAULTRATE is a running count of number of times that DEFAULT=0?

 

If so then it would clearer and easier to skip the IF/THEN and FIRST. references and just code:

DEFAULTRATE + DEFAULT;

NONDEFAULTRATE + (NOT DEFAULT);

 

What is the PROC SQL step doing? It looks like it is just merging on the sum(), but perhaps there is something else happening there?

 

What is the meaning of POWERSTAT?  What does it mean in words? Is it some type of Power Statistic? If so which one? What formula are you using?  Do you have a reference?  Is it possible that it is a standard statistic that an existing SAS PROC can calculate for you? That  would be a lot easier than writing your own formula.

 

Reeza
Super User
Assuming you want to do this over many different datasets:
1. Create a macro that calculates the output and appends it to a master table
%powerstat(datain=, var=, dataout=);
2. Create a dataset that is a list of all datasets that need to be processed with the variable that needs to be executed.
3. Use Call execute to call macro from #1
ChrisWard
SAS Employee

I'm assuming the vars to be analyzed are Data1-Datax (you had Data Data2.)

 

With that assumption the following code will do what you need, I beieve.

 

data results;
  length ColumnName $32 PowerStat 8 ;
  stop;
run;
%macro iterate(dataCols=1);
%do dataCol = 1 %to &dataCols.;
proc sort data=work.defaultCok(rename=(data&DataCol.=data)) out=work.defaultCok2;
  by descending Data;
run;
data work.cumulative;
  set work.defaultCok2;
  NonDefault=1-Default;
  by data notsorted;
  if first.default then defaultRate=0;
    defaultRate+default;
if first.default then NondefaultRate=0;
    NondefaultRate+Nondefault;
run;

proc sql;
create table work.cumulative2 as 
select default,data,NonDefault,defaultRate,NondefaultRate,sum(default) as TotalDefault,sum(NonDefault) as TotalNonDefault
from work.cumulative;
quit;
data PowerStat;
set work.cumulative2 END=lastN;
ResultD=defaultRate/TotalDefault;
ResultND=NondefaultRate/TotalNonDefault;
Retain Sonuc 0 MaxSonuc 0;
Area=(ResultND+Lag(ResultND))/2*(ResultD-Lag(ResultD));
IF missing(Area)=1 then Area2=0;
Else Area2=Area;
Sonuc=Sonuc+Area2;
if Sonuc>maxSOnuc then maxsonuc=sonuc;
if lastN then do;
PowerStat=(MaxSonuc-0.5)*2;
ABSResult=ABS(PowerStat);
varName="data&datacol.";
output;
keep absresult varName;
end;
run;

data results;
  set results Powerstat(rename=(absresult=Powerstat varName=ColumnName));
run;
%end;
%mend;
%iterate(dataCols=3) * number of DataX columns ;
turcay
Lapis Lazuli | Level 10

Thank you Chris. I take the output but also it gives error like below.The other problem is, it takes so long time. Maybe i have to change the code or need to get  more efficent macro. 

 

ERROR: Variable DATA1 not found.

WARNING: The data set WORK.DEFAULTCOK2 may be incomplete. When this step was stopped there were 0 observations and 0 variables.
WARNING: Data set WORK.DEFAULTCOK2 was not replaced because this step was stopped.

 

Thank you.

Tom
Super User Tom
Super User

So here is a program that calculates your statistic.  Sorry if I messed up your carefully crafted caMeLcAse variable names. The results for these three order variables with this pattern of DEFAULT values is:

 

Data1= 0.61905

Data2= 0.26984
Data3= 0.01587

 

* Sample data with three order variables ;
data defaultCok;
	infile datalines truncover;
	input Default Data1-Data3;
	datalines;
0 4.15 5.15 4.00
1 0.03 5.50 3.50
0 1.53 1.70 3.05
1 0.32 0.47 2.50
1 0.03 0.90 1.70
0 0.61 0.00 3.70
0 0.78 0.80 2.70
1 0.00 2.90 4.10
0 0.70 2.09 0.00
1 0.09 3.00 2.90
1 0.00 4.00 0.00
1 2.81 4.02 0.10
0 0.47 5.00 2.90
1 1.92 4.00 4.50
1 0.01 3.50 3.00
0 0.69 2.10 1.70
;

* Convert to vertical to be able to process by variable ;
data vertical ;
   set defaultCok;
   array d data1-data3 ;
   length varname $32 ;
   do over d ;
     varname = vname(d);
     value = d ;
     output;
   end;
   keep default varname value ;
run;

* Order by descending value within each order variable ;
proc sort data=vertical;
  by varname descending value ;
run;

* Calculate PowerStat ;
data stats ;
  call missing(of totaldefault totalnondefault sonuc maxsonuc defaultrate nondefaultrate);
  do until (last.varname);
     set vertical ;
     by varname ;
     totaldefault = sum(totaldefault,default);
     totalnondefault = sum(totalnondefault,not default);
  end;
  do until (last.varname);
     set vertical ;
     by varname ;
     defaultRate=sum(defaultrate,default);
     nondefaultRate=sum(nondefaultrate,not default);
     ResultD=defaultRate/TotalDefault;
     ResultND=NondefaultRate/TotalNonDefault;
     if first.varname then area=0;
     else Area=(ResultND+LagResultND)/2*(ResultD-LagResultD) ;
     Sonuc=sum(Sonuc,Area);
     maxsonuc = max(maxsonuc,sonuc);
     lagResultD = ResultD ;
     lagResultND = ResultND ;
   end;
   PowerStat=(MaxSonuc-0.5)*2;
   ABSResult=ABS(PowerStat);
   keep varname absresult ;
run;
proc print; run;

 

turcay
Lapis Lazuli | Level 10

Thank you. It seems to be working. But i changed the column "Data" (for all of them) as "Rasyo" then i put a simple macro now all AbsResult is gotten same. I can't any warning in the log tab. Have you got any idea ?

 

%let colCnt=4596;
data vertical ;
   set defaultCok;
   array d Q0001-Q&colCnt.;
   length Rasyo $32 ;
   do over d ;
     Rasyo = vname(d);
     value = d ;
     output;
   end;
   keep default Rasyo value ;
run;

 

CanSas.PNG

Tom
Super User Tom
Super User

Hard to say since you did not post any error messages.  If you change the variable name from VARNAME to RASYO in this step make sure to change it in the later steps also.

Are you asking why the caclculated numbers are the same?  Do they produce the same ordering of values of DEFAULT?  From your algorithm the result is totally based on the order of 1's and 0's in the variable DEFAULT to result when they are sorted by the value of the "RASYO" variable.

turcay
Lapis Lazuli | Level 10

I just got this warning message in the Log summary;

 

WARNING: The variable default in the DROP, KEEP, or RENAME list has never been referenced.

 

Thank you.

 

Ksharp
Super User
What formula are you using to calculate PowerStat ? Maybe IML code can make it more succinct , better and faster .
turcay
Lapis Lazuli | Level 10

I think PowerStat same with GINI but i'm not sure. Also i don't know how can i perform the GINI function on this code .

 

This warning message was occured because i forgot the change name of the variable default.(WARNING: The variable default in the DROP, KEEP, or RENAME list has never been referenced.).

But there is a problem here again i'll check again thanks everyone

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 15 replies
  • 2109 views
  • 1 like
  • 6 in conversation