How did they calculate the PctGrowthAveYr in the advanced program 3 lesson 1 data ?
Hi:
We got this same question in our tracking system and will need to ask the course developers, when they are available, if they remember the site where they got the data or how the variables were created/derived. Some of the data for our classes comes from public web sites but then needs to be subset or transformed for particular purposes in a class, such as needing a rate to use for multiplication in a DO loop.
Using PROC CONTENTS on the dataset in Lesson 1, I see that the name of variable is as shown below:
Since we already have a track for this question, we'll respond when we have the information. Programming 3 is not a Statistics class. It teaches advanced techniques, such as DO LOOPS, ARRAY processing, HASH tables, etc. This particular variable is used in the DO loop example in a subsequent lesson.
Cynthia
Hi:
I have heard back from the course developers about the PctGrowthAvgYr variable in the pg3.population_top25countries data table used in the course. To make that table for use in the Programming 3 class, we started with WorldBank data from here: https://datacatalog.worldbank.org/public-licenses#cc-by ; for example: http://wdi.worldbank.org/table/2.1 .
However, that variable value cannot be recalculated based on the data that we have in the pg3.population_top25countries data set for the class. The value was based on the incremental growth rate for every year in the range - then averaged. We started with the 2000-2017 numbers, and calculated that value based on having values for all those years. But for the class, we only kept the first year (2000) and the last year (2017). This means that there’s not enough data in the class table to recreate the PctGrowthAvgYr variable. If you want to approximate the value, you could use this formula
growth=(((pop2017-pop2000)/pop2000)*100)/18;
But keep in mind that it will only be an approximation, not an exact number. For class purposes, we did not need all the original data for all the years. As I explained, we use that variable to be able to calculate values for a new variable in a DO loop. So the focus needs to be on DO loop processing, not on the specifics of how that variable was calculated.
Hope this helps,
Cynthia
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Ready to level-up your skills? Choose your own adventure.