Hello,
I want to perform a regression where the dependent variable is a distribution (sum=100%). Briefly, I want to split the number of migrants among European countries according to the population size and to a country-specific parameter.
%Migrants = Pop2013 + Country-specific parameter.
I however do not know what kind of regression can resolve this equation.
Here are my data. The p values do not matter here. I just want parameters.
Country | Pop2013 | %Migrants |
Belgium | 11137974 | 0.045363 |
Bulgaria | 7284552 | 0.002339 |
Czech Republic | 10516125 | 0.009783 |
Denmark | 5602628 | 0.017353 |
Germany | 80523746 | 0.235058 |
Estonia | 1320174 | 0.001048 |
Ireland | 4591087 | 0.017725 |
Greece | 11003615 | 0.022731 |
Spain | 46727890 | 0.062909 |
France | 65600350 | 0.077127 |
Croatia | 4262140 | 0.001663 |
Italy | 59685227 | 0.052722 |
Cyprus | 865878 | 0.004016 |
Latvia | 2023825 | 0.003502 |
Lithuania | 2971905 | 0.008913 |
Luxembourg | 537039 | 0.011235 |
Hungary | 9908798 | 0.013854 |
Malta | 421364 | 0.002621 |
Netherlands | 16779575 | 0.041267 |
Austria | 8451860 | 0.037958 |
Poland | 38062535 | 0.080412 |
Portugal | 10487289 | 0.005535 |
Romania | 20020074 | 0.070813 |
Slovenia | 2058821 | 0.002784 |
Slovakia | 5410836 | 0.002445 |
Finland | 5426674 | 0.009181 |
Sweden | 9555893 | 0.020552 |
United Kingdom | 63905297 | 0.139092 |
Yeah. You are right. But you can map it back to sum to 1.
a/(a+b+....z) b/(a+b+....z) ........
If you want strictly sum predicted value to be 1, I think it is not possible for any Predict Model on account of error term .
%Migrants = Pop2013 + Country-specific parameter.
I don't see how any regression works here. Normally, the terms in the regression are variables that actually exist in your data, and slopes are estimated for the effect of the variable(s). But you can't have a regression with "Country-specific parameter" because it doesn't exist in your data set. Even with your data, the best you can get is %Migrants = intercept + slope*Pop2013, and there is no country specific anything in this regression.
Can you provide more details about what this "Country-specific parameter" is?
Maybe the regression is not the proper method. I want to have an equation in which when the population changes, the proportion of migrants also changes. For instance, if the population of UK increased by 2%, and 0% in other countries, then the proportion of migrants choosing UK would be higher.
Hint:
Provide a small input example data set, the rules or values that need to change and the result of the application of that rule to the input data as a result data set.
You rules might include how "population changes" as a data set.
Note that typically in a proportion or percentage you should define a denominator and a numerator.
Are you looking to do something such that if I have "migrants" and they are proportionally distributed across the countries relative to population total?
Then Proc freq can generate a percent of total population based on a population data set with something like
Proc freq data=have noprint;
tables country /out=want outpct;
weight pop2013;
run;
but that's kind of guessing as I think your problem is not well defined to us yet.
Since you want to see how changes in population relate to changes in immigration, you will need to consider many years of data, for many countries.
Now, it should be noted that immigration causes the country population to change. So you might want to separate the two.
Maybe could try Poisson Regression. Make data like :
Country Pop2013 %Migrants total
Belgium 11137974 454 10000
Bulgaria 7284552 23 10000
take TOTAL as an offset variable and run PROC GENMOD for Poisson Regression.
But doing this, if I use parameters on another set of values for the population, the sum of the predicted value for the 28 countries could be larger than 100%, no?
Yeah. You are right. But you can map it back to sum to 1.
a/(a+b+....z) b/(a+b+....z) ........
If you want strictly sum predicted value to be 1, I think it is not possible for any Predict Model on account of error term .
Hi,
I found a PROC which could satisfy your requirement.
Check PROC BCHOICE and Check the last example of it in documentation. I think that is what you are looking for .
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.