Hello--
I have some tricky numeric transformations I’m having trouble figuring out. I'm finding them tricky because I have to start by basing the scores off to values in one variable.
The data set below are the variables I start with plus the variables I need to calculate.
Start With | Variables which need to be calculated | ||||||||
A | B | C | D | E | F | G | H | I | J |
Office | Certification | Total_of_Homes | Closed | Compliance_Rate | Home_Type_Proportion | Points_Available | Home_Points_Earned | Total _Points | Total_Score |
Office1 | 12 -month | 173 | 1% | 99% | 88% | 8.8 | 8.8 | 9.8 | 98% |
Office1 | kinship 90 days | 23 | 13% | 87% | 12% | 1.2 | 1.0 |
Can you see why this may be a little tricky to figure out? Any help is greatly appreciated! Thanks
@Robert: I used Total_Homes and tot_points as placeholders so that the total values of each could be accumulated during the loop. Both are dropped by the time the file is actually written. To understand what is actually happening through the two loops, it is easiest (I think) to add a couple of putlog _all_; statements in the code. e.g.,
data have;
informat office $8.;
informat certification $20.;
informat closed compliance_rate percent3.;
input Office Certification & Total_of_Homes Closed Compliance_Rate;
cards;
Office1 12-month 173 1% 99%
Office1 kinship 90 days 23 13% 87%
;
data want (drop=Total_Homes tot_points);
do until(last.office);
set have;
by office;
if first.office then Total_Homes=Total_of_Homes;
else Total_Homes+Total_of_Homes;
putlog _all_;
end;
do until(last.office);
set have;
by office;
Home_Type_Proportion=Total_of_Homes/Total_Homes;
Points_Available=Home_Type_Proportion*10;
Home_Points_Earned=Points_Available*Compliance_Rate;
if first.office then tot_points=Home_Points_Earned;
else do;
tot_points+Home_Points_Earned;
Total_Points=tot_points;
Total_Score=Total_Points*10;
end;
output;
putlog _all_;
end;
run;
Hi, I see your references to Row 4 and Row 5 (C4, C5), but if your column header cells are in Row 1, I only see Row 2 and Row 3. (see screenshot) So, can you explain what data is in Row 4 and Row 5???
And, what is the structure of your SAS dataset? You show an Excel worksheet -- is your data already in SAS form, or does information need to be read into or imported into SAS?
cynthia
Hi Cynthia--
Yes, so sorry the row numbers are off they should be C2 and C3 the row with letter values were just for illustration purposes. Yes, the data is already in SAS format. I just used excel for the example.
Thanks.
If you are working with a SAS dataset, you could use a dow loop to achieve the result you want. I didn't bother to do any of the implied rounding:
data have;
informat office $8.;
informat certification $20.;
informat closed compliance_rate percent3.;
input Office Certification & Total_of_Homes Closed Compliance_Rate;
cards;
Office1 12-month 173 1% 99%
Office1 kinship 90 days 23 13% 87%
;
data want (drop=Total_Homes tot_points);
do until(last.office);
set have;
by office;
if first.office then Total_Homes=Total_of_Homes;
else Total_Homes+Total_of_Homes;
end;
do until(last.office);
set have;
by office;
Home_Type_Proportion=Total_of_Homes/Total_Homes;
Points_Available=Home_Type_Proportion*10;
Home_Points_Earned=Points_Available*Compliance_Rate;
if first.office then tot_points=Home_Points_Earned;
else do;
tot_points+Home_Points_Earned;
Total_Points=tot_points;
Total_Score=Total_Points*10;
end;
output;
end;
run;
Hi Arthur--
I am so glad you write this in Base SAS as opposed to SQL. I am in the process of trying to learn by group processing and do loops. I have a few questions. I am not to sure whats happening in the parts of the code in bold of the code and I can't seem to see what the significance of the variable tot_points
.
data want (drop=Total_Homes tot_points);
do until(last.office);
set have;
by office;
if first.office then Total_Homes=Total_of_Homes;
else Total_Homes+Total_of_Homes;
end;
run;
do until(last.office);
set have;
by office;
Home_Type_Proportion=Total_of_Homes/Total_Homes;
Points_Available=Home_Type_Proportion*10;
Home_Points_Earned=Points_Available*Compliance_Rate;
if first.office then tot_points=Home_Points_Earned;
else do;
tot_points+Home_Points_Earned;
Total_Points=tot_points;
Total_Score=Total_Points*10;
end;
output;
end;
run;
THANKS!!!!!
@Robert: I used Total_Homes and tot_points as placeholders so that the total values of each could be accumulated during the loop. Both are dropped by the time the file is actually written. To understand what is actually happening through the two loops, it is easiest (I think) to add a couple of putlog _all_; statements in the code. e.g.,
data have;
informat office $8.;
informat certification $20.;
informat closed compliance_rate percent3.;
input Office Certification & Total_of_Homes Closed Compliance_Rate;
cards;
Office1 12-month 173 1% 99%
Office1 kinship 90 days 23 13% 87%
;
data want (drop=Total_Homes tot_points);
do until(last.office);
set have;
by office;
if first.office then Total_Homes=Total_of_Homes;
else Total_Homes+Total_of_Homes;
putlog _all_;
end;
do until(last.office);
set have;
by office;
Home_Type_Proportion=Total_of_Homes/Total_Homes;
Points_Available=Home_Type_Proportion*10;
Home_Points_Earned=Points_Available*Compliance_Rate;
if first.office then tot_points=Home_Points_Earned;
else do;
tot_points+Home_Points_Earned;
Total_Points=tot_points;
Total_Score=Total_Points*10;
end;
output;
putlog _all_;
end;
run;
Here's a two step SQL solution.
You could probably do it in one but I didn't feel like working out the math
data have;
input office certification $ total_of_homes close compliance_rate;
cards;
1 12M 173 .01 .99
1 Kinship 23 .13 .87
;
proc sql;
create table step1 as
select *, sum(total_of_homes) as n_homes format=8.,
total_of_homes/calculated n_homes as home_type_proportion format=percent8.,
calculated home_type_proportion*10 as points_available format=8.1,
calculated points_available*compliance_rate as home_points_earned format=8.1
from have
group by office;
create table step2 as
select *, sum(home_points_earned) as total_points format=8.1,
sum(home_points_earned)/10 as total_score format=percent8.
from step1
group by office;
quit;
Assuming you want this calculation for each Office and the two Certifications are present for each office This SQL query will do the trick :
data have;
length certification $20;
input office $ Certification $ Total_of_Homes Closed :percent3. Compliance_Rate :percent3.;
datalines;
Office1 12-month 173 1% 99%
Office1 kinship_90_days 23 13% 87%
;
proc sql;
create table want as
select office, Certification, Total_of_Homes, Closed, Compliance_Rate,
Total_of_Homes/sum(Total_of_Homes) as Home_Type_Proportion format=percentn5.0,
10 * calculated Home_Type_Proportion as Points_Available format=4.1,
Compliance_Rate * calculated Points_Available as Home_Points_Earned format=4.1,
10 * sum(Total_of_Homes*Compliance_Rate) / sum(Total_of_Homes) as Total_Points format=4.1,
calculated Total_Points/10 as Total_Score format=percentn5.0
from have
group by office;
select * from want;
quit;
PG
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.