BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Demographer
Pyrite | Level 9

Hello,

I want to perform a regression where the dependent variable is a distribution (sum=100%). Briefly, I want to split the number of migrants among European countries according to the population size and to a country-specific parameter.

 

%Migrants = Pop2013 + Country-specific parameter.

 

I however do not know what kind of regression can resolve this equation.

 

Here are my data. The p values do not matter here. I just want parameters.

 

Country Pop2013 %Migrants
Belgium 11137974 0.045363
Bulgaria 7284552 0.002339
Czech Republic 10516125 0.009783
Denmark 5602628 0.017353
Germany 80523746 0.235058
Estonia 1320174 0.001048
Ireland 4591087 0.017725
Greece 11003615 0.022731
Spain 46727890 0.062909
France 65600350 0.077127
Croatia 4262140 0.001663
Italy 59685227 0.052722
Cyprus 865878 0.004016
Latvia 2023825 0.003502
Lithuania 2971905 0.008913
Luxembourg 537039 0.011235
Hungary 9908798 0.013854
Malta 421364 0.002621
Netherlands 16779575 0.041267
Austria 8451860 0.037958
Poland 38062535 0.080412
Portugal 10487289 0.005535
Romania 20020074 0.070813
Slovenia 2058821 0.002784
Slovakia 5410836 0.002445
Finland 5426674 0.009181
Sweden 9555893 0.020552
United Kingdom 63905297 0.139092
1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

Yeah. You are right. But you can map it back to sum to 1.

a/(a+b+....z)   b/(a+b+....z)   ........

If you want strictly sum predicted value to be 1, I think it is not possible for any Predict Model on account of error term .

View solution in original post

8 REPLIES 8
PaigeMiller
Diamond | Level 26

%Migrants = Pop2013 + Country-specific parameter.

 

I don't see how any regression works here. Normally, the terms in the regression are variables that actually exist in your data, and slopes are estimated for the effect of the variable(s). But you can't have a regression with "Country-specific parameter" because it doesn't exist in your data set. Even with your data, the best you can get is %Migrants = intercept + slope*Pop2013, and there is no country specific anything in this regression.

 

Can you provide more details about what this "Country-specific parameter" is?

--
Paige Miller
Demographer
Pyrite | Level 9

Maybe the regression is not the proper method. I want to have an equation in which when the population changes, the proportion of migrants also changes. For instance, if the population of UK increased by 2%, and 0% in other countries, then the proportion of migrants choosing UK would be higher.

 

 

ballardw
Super User

Hint:

Provide a small input example data set, the rules or values that need to change and the result of the application of that rule to the input data as a result data set.

You rules might include how "population changes" as a data set.

 

Note that typically in a proportion or percentage you should define a denominator and a numerator.

Are you looking to do something such that if I have  "migrants" and they are proportionally distributed across the countries relative to population total?

Then Proc freq can generate a percent of total population based on a population data set with something like

 

Proc freq data=have noprint;

   tables country /out=want outpct;

   weight pop2013;

run;

 

but that's kind of guessing as I think your problem is not well defined to us yet.

 

PGStats
Opal | Level 21

Since you want to see how changes in population relate to changes in immigration, you will need to consider many years of data, for many countries.

 

Now, it should be noted that immigration causes the country population to change. So you might want to separate the two.

PG
Ksharp
Super User

Maybe could try Poisson Regression. Make data like :

Country Pop2013 %Migrants  total
Belgium 11137974  454             10000
Bulgaria 7284552   23                 10000

 

take TOTAL as an offset variable and run PROC GENMOD for Poisson Regression.

Demographer
Pyrite | Level 9

But doing this, if I use parameters on another set of values for the population, the sum of the predicted value for the 28 countries could be larger than 100%, no?

Ksharp
Super User

Yeah. You are right. But you can map it back to sum to 1.

a/(a+b+....z)   b/(a+b+....z)   ........

If you want strictly sum predicted value to be 1, I think it is not possible for any Predict Model on account of error term .

Ksharp
Super User

Hi,

I found a PROC which could satisfy your requirement.

Check PROC BCHOICE and Check the last example of it in documentation. I think that is what you are looking for .

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 1532 views
  • 0 likes
  • 5 in conversation