Solved: Re: A regression where dependant vaiable is a distribution (sum=100%)

Demographer · Posted 01-29-2018 10:12 AM

Hello,

I want to perform a regression where the dependent variable is a distribution (sum=100%). Briefly, I want to split the number of migrants among European countries according to the population size and to a country-specific parameter.

%Migrants = Pop2013 + Country-specific parameter.

I however do not know what kind of regression can resolve this equation.

Here are my data. The p values do not matter here. I just want parameters.

Country	Pop2013	%Migrants
Belgium	11137974	0.045363
Bulgaria	7284552	0.002339
Czech Republic	10516125	0.009783
Denmark	5602628	0.017353
Germany	80523746	0.235058
Estonia	1320174	0.001048
Ireland	4591087	0.017725
Greece	11003615	0.022731
Spain	46727890	0.062909
France	65600350	0.077127
Croatia	4262140	0.001663
Italy	59685227	0.052722
Cyprus	865878	0.004016
Latvia	2023825	0.003502
Lithuania	2971905	0.008913
Luxembourg	537039	0.011235
Hungary	9908798	0.013854
Malta	421364	0.002621
Netherlands	16779575	0.041267
Austria	8451860	0.037958
Poland	38062535	0.080412
Portugal	10487289	0.005535
Romania	20020074	0.070813
Slovenia	2058821	0.002784
Slovakia	5410836	0.002445
Finland	5426674	0.009181
Sweden	9555893	0.020552
United Kingdom	63905297	0.139092

Ksharp · Posted 01-30-2018 08:53 AM

Yeah. You are right. But you can map it back to sum to 1.

a/(a+b+....z) b/(a+b+....z) ........

If you want strictly sum predicted value to be 1, I think it is not possible for any Predict Model on account of error term .

View solution in original post

PaigeMiller · Posted 01-29-2018 10:17 AM

%Migrants = Pop2013 + Country-specific parameter.

I don't see how any regression works here. Normally, the terms in the regression are variables that actually exist in your data, and slopes are estimated for the effect of the variable(s). But you can't have a regression with "Country-specific parameter" because it doesn't exist in your data set. Even with your data, the best you can get is %Migrants = intercept + slope*Pop2013, and there is no country specific anything in this regression.

Can you provide more details about what this "Country-specific parameter" is?

--
Paige Miller

Demographer · Posted 01-29-2018 11:03 AM

Maybe the regression is not the proper method. I want to have an equation in which when the population changes, the proportion of migrants also changes. For instance, if the population of UK increased by 2%, and 0% in other countries, then the proportion of migrants choosing UK would be higher.

ballardw · Posted 01-29-2018 01:41 PM

Hint:

Provide a small input example data set, the rules or values that need to change and the result of the application of that rule to the input data as a result data set.

You rules might include how "population changes" as a data set.

Note that typically in a proportion or percentage you should define a denominator and a numerator.

Are you looking to do something such that if I have "migrants" and they are proportionally distributed across the countries relative to population total?

Then Proc freq can generate a percent of total population based on a population data set with something like

Proc freq data=have noprint;

tables country /out=want outpct;

weight pop2013;

run;

but that's kind of guessing as I think your problem is not well defined to us yet.

PGStats · Posted 01-29-2018 04:50 PM

Since you want to see how changes in population relate to changes in immigration, you will need to consider many years of data, for many countries.

Now, it should be noted that immigration causes the country population to change. So you might want to separate the two.

PG

Ksharp · Posted 01-30-2018 08:38 AM

Maybe could try Poisson Regression. Make data like :

Country Pop2013 %Migrants total
Belgium 11137974 454 10000
Bulgaria 7284552 23 10000

take TOTAL as an offset variable and run PROC GENMOD for Poisson Regression.

Demographer · Posted 01-30-2018 08:42 AM

But doing this, if I use parameters on another set of values for the population, the sum of the predicted value for the 28 countries could be larger than 100%, no?

Ksharp · Posted 01-30-2018 08:53 AM

Yeah. You are right. But you can map it back to sum to 1.

a/(a+b+....z) b/(a+b+....z) ........

If you want strictly sum predicted value to be 1, I think it is not possible for any Predict Model on account of error term .

Ksharp · Posted 01-31-2018 07:17 AM

Hi,

I found a PROC which could satisfy your requirement.

Check PROC BCHOICE and Check the last example of it in documentation. I think that is what you are looking for .

SAS Innovate 2025: Save the Date