- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Posted 04-07-2010 10:28 AM
(3412 views)
I have a big dataset A. The main variables in Data A are fico_score, account_number, charge_off.
I need to analysis these three variables. In every 5 score points range from min to max, to calculate the frequency of account_number, total balance of charge_off and also to show the range in the output.
The Dataset A I have like this kind:
Fico_Score Account_number Charge_off
450 123 2.5
450 456 2.1
451 213 .
.....
The result I expect to get likes this:
Score_band Freq_acct Total_CO
450-455 45 21.1
456- 460 23 31.1
.......
I tried these codes, but it didn’t work.
Data test;
set a;
retain Score_band Freq_amt Total_co;
do i = 450 to 830 by 5;
proc means data = a noprint;
var account_number charge_off;
where fico_score >= i and Fico_score < i+5;
output n = Freq_acct sum = Total_Co;
end;
run;
Could someone tell me some clues to code these?
I need to analysis these three variables. In every 5 score points range from min to max, to calculate the frequency of account_number, total balance of charge_off and also to show the range in the output.
The Dataset A I have like this kind:
Fico_Score Account_number Charge_off
450 123 2.5
450 456 2.1
451 213 .
.....
The result I expect to get likes this:
Score_band Freq_acct Total_CO
450-455 45 21.1
456- 460 23 31.1
.......
I tried these codes, but it didn’t work.
Data test;
set a;
retain Score_band Freq_amt Total_co;
do i = 450 to 830 by 5;
proc means data = a noprint;
var account_number charge_off;
where fico_score >= i and Fico_score < i+5;
output n = Freq_acct sum = Total_Co;
end;
run;
Could someone tell me some clues to code these?
9 REPLIES 9
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Look at the SAS macro tools. You can do it with %DO and some of the other tools there.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Duke, I will try it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Assign a new variable in your dataset representing your groups, and use the BY statement with your PROC MEANS.
Scott Barry
SBBWorks, Inc.
Scott Barry
SBBWorks, Inc.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Scott,
Could you please teach me how to assign the group variable?
The Fico Score value in the dataset likes this
Fico Score
412
412
413
413
413
414
414
415
416
How can I assign the variable to make, every 5 point as one group?
What I expect the result likes this
Fico Score Group
412 1
412 1
413 1
413 1
413 1
414 1
414 1
415 1
416 1
Could you please teach me how to assign the group variable?
The Fico Score value in the dataset likes this
Fico Score
412
412
413
413
413
414
414
415
416
How can I assign the variable to make, every 5 point as one group?
What I expect the result likes this
Fico Score Group
412 1
412 1
413 1
413 1
413 1
414 1
414 1
415 1
416 1
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Sort your input data, then in a DATA step, assign a new SAS variable with an incrementing value to represent each unique "group-range", and each time the current observation has exceeded your previous group-range, increment the new variable. Finally use the new variable you created with a PROC MEANS BY group processing.
Scott Barry
SBBWorks, Inc.
Scott Barry
SBBWorks, Inc.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks very much. I try it right now.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
According to your help, I tried these codes and finally it out.
data new (drop=i diff);
set try;
retain i 400;
do ;
diff=fico_score - i;
if diff le 5 then band= i;
retain i;
else do ;
if ( diff gt 5 and diff le 10) then i=i +5 ;
else i=i+10;
band =i;
retain i;
* retain band;
end;
end;
run;
data new (drop=i diff);
set try;
retain i 400;
do ;
diff=fico_score - i;
if diff le 5 then band= i;
retain i;
else do ;
if ( diff gt 5 and diff le 10) then i=i +5 ;
else i=i+10;
band =i;
retain i;
* retain band;
end;
end;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
QLi,
Scott's approach is much more elegant and efficient than %DO. His approach requires roughly three passes through the data (Data step to group fico, a sort, and a means). the %DO approach requires about 80 passes through the data (MEANS is passes the data once for each level of the index variable).
Scott's approach is much more elegant and efficient than %DO. His approach requires roughly three passes through the data (Data step to group fico, a sort, and a means). the %DO approach requires about 80 passes through the data (MEANS is passes the data once for each level of the index variable).
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks very much.