Hi all,
I have the total number of pupils in the year 1992 and I need to calculate the number of pupils in each course in 1992. see below code . The problem is the numbers I obtained have decimals points, when I round them they don't always add up to the total. Does anyone know how to fix this issue so when I sum the number of pupils in each course it would add up to 1145.
data have;
input id :$4. pupils1990 fraction total1992;
datalines;
100 50 0.04784689 1145
101 5 0.004784689 1145
102 131 0.125358852 1145
104 21 0.020095694 1145
105 15 0.014354067 1145
106 3 0.002870813 1145
107 12 0.011483254 1145
108 5 0.004784689 1145
109 17 0.016267943 1145
110 29 0.027751196 1145
112 152 0.145454545 1145
113 23 0.022009569 1145
114 37 0.035406699 1145
115 40 0.038277512 1145
116 65 0.062200957 1145
117 52 0.049760766 1145
118 44 0.042105263 1145
119 7 0.006698565 1145
120 9 0.00861244 1145
121 4 0.003827751 1145
122 7 0.006698565 1145
123 8 0.007655502 1145
124 27 0.025837321 1145
125 234 0.223923445 1145
126 16 0.015311005 1145
127 16 0.015311005 1145
128 16 0.015311005 1145
;
data want;
set have;
pupils1992 = round(fraction*total1992);
run;
proc print data=want;
run;
Any help would be greatly appreciated. Thank you.
Go with cumulative fractions instead:
data want;
retain pupils1992;
set have;
cumFrac + fraction;
pupils1992 = round(cumFrac*total1992 - cumPup);
cumPup + pupils1992;
drop cum:;
run;
proc print data=want noobs;
var id pupils1990 fraction pupils1992;
sum pupils1990 pupils1992 fraction;
run;
Thank you very much PGStats , I will try this code.
Also, excuse my naivety, but can you explain what will the following line of code achieve?
pupils1992 = round(cumFrac*total1992 - cumPup);
pupils1992 = round(cumFrac*total1992 - cumPup);
will calculate the difference between the cumulative fraction for the current observation and the cumulative number of pupils for the previous observation. That way, the accumulated difference between the number of pupils (integers) and fractions (real numbers) is compensated as soon as it gets greater than 0.5.
Thank you very much PGStats, the code works great!
Now I added 60 schools in my dataset and I need to calculate the same estimate.I applied the code provided and I realized that the code produce a cumulative of the last observation of the current school and the first observation of the next school. Is there a way to fix this?
I assumed you have a new variable called school
data have;
input id :$4. pupils1990 fraction school $ total1992;
datalines;
100 50 0.04784689 A 1145
101 5 0.004784689 A 1145
102 131 0.125358852 A 1145
104 21 0.020095694 A 1145
105 15 0.014354067 A 1145
106 3 0.002870813 A 1145
107 12 0.011483254 A 1145
108 5 0.004784689 A 1145
109 17 0.016267943 A 1145
110 29 0.027751196 A 1145
112 152 0.145454545 A 1145
113 23 0.022009569 A 1145
114 37 0.035406699 A 1145
115 40 0.038277512 A 1145
116 65 0.062200957 A 1145
117 52 0.049760766 A 1145
118 44 0.042105263 A 1145
119 7 0.006698565 A 1145
120 9 0.00861244 A 1145
121 4 0.003827751 A 1145
122 7 0.006698565 A 1145
123 8 0.007655502 A 1145
124 27 0.025837321 A 1145
125 234 0.223923445 A 1145
126 16 0.015311005 A 1145
127 16 0.015311005 A 1145
128 16 0.015311005 A 1145
100 50 0.04784689 B 1155
101 5 0.004784689 B 1155
102 131 0.125358852 B 1155
104 21 0.020095694 B 1155
105 15 0.014354067 B 1155
106 3 0.002870813 B 1155
107 12 0.011483254 B 1155
108 5 0.004784689 B 1155
109 17 0.016267943 B 1155
110 29 0.027751196 B 1155
112 152 0.145454545 B 1155
113 23 0.022009569 B 1155
114 37 0.035406699 B 1155
115 40 0.038277512 B 1155
116 65 0.062200957 B 1155
117 52 0.049760766 B 1155
118 44 0.042105263 B 1155
119 7 0.006698565 B 1155
120 9 0.00861244 B 1155
121 4 0.003827751 B 1155
122 7 0.006698565 B 1155
123 8 0.007655502 B 1155
124 27 0.025837321 B 1155
125 234 0.223923445 B 1155
126 16 0.015311005 B 1155
127 16 0.015311005 B 1155
128 16 0.015311005 B 1155
;
data want;
retain pupils1992;
set have;
by school notsorted;
if first.school then do;
cumFrac = 0;
cumPup = 0;
end;
cumFrac + fraction;
pupils1992 = round(cumFrac*total1992 - cumPup);
cumPup + pupils1992;
drop cum:;
run;
proc print data=want noobs;
by school notsorted;
var school id pupils1990 fraction pupils1992;
sum pupils1990 pupils1992 fraction;
run;
you nailed this perfectly. thank you , thank you, thank you
Hi @archibald,
Please note that the advantage of getting integers adding up to the given total comes at a price: The numbers obtained contain a "random component" in the sense that they depend (in general) on the order of courses.
Example: The "exact" estimate for course id 123 is 8.765... Currently, the integer estimate is 8, but it could very well be 9 if the courses were numbered (or ordered) differently.
Whether this is an issue depends on what you plan to do with the results.
That is perfectly true @FreelanceReinh. The end result depends on the order of allocation. I have met this problem in the context of sample allocation where I found that it is generally preferable to do the allocation from the lowest to the largest fraction, especially when a minimum stratum sample size (=1) had to be enforced.
The data are currently sorted by ID (within school). Unfortunately, in general you can't have "correctly" rounded integers and at the same time hit the prespecified sum exactly.
Got it ! thank you for your valuable input!
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.