BookmarkSubscribeRSS Feed
archibald
Obsidian | Level 7

Hi all, 

I have the total number of pupils in the year 1992  and I need  to calculate the number of pupils  in each course in 1992. see below code . The problem is  the numbers I obtained  have decimals points, when I round them they don't always add up to the total.  Does anyone know how to fix this issue so  when I sum the number of pupils  in each course it would add up to 1145. 

 


data have; input id :$4. pupils1990 fraction total1992; datalines; 100 50 0.04784689 1145 101 5 0.004784689 1145 102 131 0.125358852 1145 104 21 0.020095694 1145 105 15 0.014354067 1145 106 3 0.002870813 1145 107 12 0.011483254 1145 108 5 0.004784689 1145 109 17 0.016267943 1145 110 29 0.027751196 1145 112 152 0.145454545 1145 113 23 0.022009569 1145 114 37 0.035406699 1145 115 40 0.038277512 1145 116 65 0.062200957 1145 117 52 0.049760766 1145 118 44 0.042105263 1145 119 7 0.006698565 1145 120 9 0.00861244 1145 121 4 0.003827751 1145 122 7 0.006698565 1145 123 8 0.007655502 1145 124 27 0.025837321 1145 125 234 0.223923445 1145 126 16 0.015311005 1145 127 16 0.015311005 1145 128 16 0.015311005 1145 ; data want; set have; pupils1992 = round(fraction*total1992); run; proc print data=want; run;

 Any help would be greatly appreciated. Thank you.

 

11 REPLIES 11
PGStats
Opal | Level 21

Go with cumulative fractions instead:

 

data want;
retain pupils1992;
set have;
cumFrac + fraction;
pupils1992 = round(cumFrac*total1992 - cumPup);
cumPup + pupils1992;
drop cum:;
run;

proc print data=want noobs; 
var id pupils1990 fraction pupils1992; 
sum pupils1990 pupils1992 fraction;
run;
PG
archibald
Obsidian | Level 7

Thank you very much PGStats , I will try this code.  

Also, excuse  my naivety, but can you explain what will  the following line of code achieve?

pupils1992 = round(cumFrac*total1992 - cumPup); 

 

PGStats
Opal | Level 21
pupils1992 = round(cumFrac*total1992 - cumPup); 

will calculate the difference between the cumulative fraction for the current observation and the cumulative number of pupils for the previous observation. That way, the accumulated difference between the number of pupils (integers) and fractions (real numbers) is compensated as soon as it gets greater than 0.5. 

PG
archibald
Obsidian | Level 7

Thank you very much PGStats, the code works great!

Now  I added 60 schools in my dataset  and I need to calculate the same  estimate.I applied the code provided and I realized that the code produce a cumulative of  the last observation of the current  school and the first observation of the next school. Is there a way to fix this?

 

PGStats
Opal | Level 21

I assumed you have a new variable called school

 

data have;
input id :$4. pupils1990 fraction school $ total1992;
datalines;
100	50	0.04784689	A 1145
101	5	0.004784689	A 1145
102	131	0.125358852	A 1145
104	21	0.020095694	A 1145
105	15	0.014354067	A 1145
106	3	0.002870813	A 1145
107	12	0.011483254	A 1145
108	5	0.004784689	A 1145
109	17	0.016267943	A 1145
110	29	0.027751196	A 1145
112	152	0.145454545	A 1145
113	23	0.022009569	A 1145
114	37	0.035406699	A 1145
115	40	0.038277512	A 1145
116	65	0.062200957	A 1145
117	52	0.049760766	A 1145
118	44	0.042105263	A 1145
119	7	0.006698565	A 1145
120	9	0.00861244	A 1145
121	4	0.003827751	A 1145
122	7	0.006698565	A 1145
123	8	0.007655502	A 1145
124	27	0.025837321	A 1145
125	234	0.223923445	A 1145
126	16	0.015311005	A 1145
127	16	0.015311005	A 1145
128	16	0.015311005	A 1145
100	50	0.04784689	B 1155
101	5	0.004784689	B 1155
102	131	0.125358852	B 1155
104	21	0.020095694	B 1155
105	15	0.014354067	B 1155
106	3	0.002870813	B 1155
107	12	0.011483254	B 1155
108	5	0.004784689	B 1155
109	17	0.016267943	B 1155
110	29	0.027751196	B 1155
112	152	0.145454545	B 1155
113	23	0.022009569	B 1155
114	37	0.035406699	B 1155
115	40	0.038277512	B 1155
116	65	0.062200957	B 1155
117	52	0.049760766	B 1155
118	44	0.042105263	B 1155
119	7	0.006698565	B 1155
120	9	0.00861244	B 1155
121	4	0.003827751	B 1155
122	7	0.006698565	B 1155
123	8	0.007655502	B 1155
124	27	0.025837321	B 1155
125	234	0.223923445	B 1155
126	16	0.015311005	B 1155
127	16	0.015311005	B 1155
128	16	0.015311005	B 1155
;

data want;
retain pupils1992;
set have;
by school notsorted;
if first.school then do;
    cumFrac = 0;
    cumPup = 0;
    end;
cumFrac + fraction;
pupils1992 = round(cumFrac*total1992 - cumPup);
cumPup + pupils1992;
drop cum:;
run;

proc print data=want noobs; 
by school notsorted;
var school id pupils1990 fraction pupils1992; 
sum pupils1990 pupils1992 fraction;
run;
PG
archibald
Obsidian | Level 7

you nailed this perfectly. thank you , thank you, thank you

FreelanceReinh
Jade | Level 19

Hi @archibald,

 

Please note that the advantage of getting integers adding up to the given total comes at a price: The numbers obtained contain a "random component" in the sense that they depend (in general) on the order of courses.

 

Example: The "exact" estimate for course id 123 is 8.765... Currently, the integer estimate is 8, but it could very well be 9 if the courses were numbered (or ordered) differently.

 

Whether this is an issue depends on what you plan to do with the results.

PGStats
Opal | Level 21

That is perfectly true @FreelanceReinh. The end result depends on the order of allocation. I have met this problem in the context of sample allocation where I found that it is generally preferable to do the allocation from the lowest to the largest fraction, especially when a minimum stratum sample size (=1) had to be enforced.

PG
archibald
Obsidian | Level 7
greatly appreciate your input, FreelanceReinhard . I understand this the downside of the method.
so would it be advisable to sort by id course to get the true integers, since 8.756 should really be 9 and not 8?
FreelanceReinh
Jade | Level 19

The data are currently sorted by ID (within school). Unfortunately, in general you can't have "correctly" rounded integers and at the same time hit the prespecified sum exactly.

archibald
Obsidian | Level 7

Got it ! thank you for your valuable input!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 1422 views
  • 2 likes
  • 3 in conversation