BookmarkSubscribeRSS Feed
SmcGarrett
Obsidian | Level 7

Hi All, 

 

Not looking for a lesson in stats as I am perfectly capable of researching statistical methods and I do have a background in econometrics and stats. I would however like to ask if anyone can point me in a good direction or even just giev me the name of the class of models I am describing. 

 

I have data sets that typically ranges from 1000-8000 observations. Typically there over 50-200 individuals (in my case they are marketing channels) and each has a multiple observations. Because the individual effects of each of these observations are typically grouped with the individual effects of multiple individuals, the individual effect of a station is not isolated. Instead, I buckets (if you will) of effects that I know came from Individual A, B ,G, E, and F.

 

For example, I have 8000 Unique effects (observations) caused by 100 Individuals. On average, 5 random individuals have an effect at one time...thus, I have  8000/5 groups = 1600 groups. 

 

Group 1: Individual A, Individual B, Individual C, Individual D, Individual E

Effect of Group 1: $5000

 

Group 2: Individual A, Individual C, Individual G, Individual H, Individual L

Effect of Group 2: $4,000

 

Group 3: Ind. Z, E, Y, D, Q

Effecot of Group 3: $8,000

 

So above I have 3 groups of 5 individuals whose aggregated effects are observed. My goal is to use the entire dataset to determine how much of the effect of each group was due to the individuals of the group. 

 

For Group 1, the end goal would be something like: 

Ind. A = 40% = $2000

Ind. B = 10% = $500

Ind. C = 15% = $750

Ind. D = 20% = $1000

Ind. E = 15% = $750

 

I am assuming this is going to be a probablistic model and possible a panel data model.

 

Any suggestions? 

 

 

3 REPLIES 3
gergely_batho
SAS Employee

I hope I understand the problem correctly.
You have individuals, that occasionally go to a shop in groups and buy somthing.
You want to estimate the average buying amount of an individual.
The problem is, that you are unable to observe individuals, you are always observing the sum of the groups.

Another problem of similar structure is, when we sell different products in a package to customers. We are able to observe only the total amount the custoomers pay for a package.

Linear regression with no intercept can do it:

data have;
input a b c d sum;
datalines;
1 1 0 0 100
1 0 1 1 400
1 1 0 1 900
1 1 0 1 370
1 0 1 1 500
0 1 0 1 250
0 1 1 1 600
;
run;
proc glm data=have ;
model sum=a b c d / noint solution ;
run;

/*
results:
a=95.5
b=100.8
c=60.5
d=342
*/

Edit: depending on what types of "effects" you want to see, maybe you don't need the noint option. Or maybe you want to take the logarithm of the dependent variable (sum), then essentially you will analyze multiplicative effects.

SmcGarrett
Obsidian | Level 7

Hi @gergely_batho,

 

Thanks for your response. 

 

I thought that linear regression would work as well, but it doesn't seem to be doing the trick. The problem is that the individual effects of each individual within the group have a variance of their own. This sometimes causes a particular individual to have a "negative" effect or estimated coefficient which does not make sense. In my case, the effect has to be 0 or greater. That's why I'm searching for something that will tell me the percentage of the effect an individual should get based on all the combinations of effects from all observerable groups. 

 

Michael

 

 

 

 

 

 

gergely_batho
SAS Employee

Yes. To be honest I had to play around with my fabrcated data, to get all the estimates positive.

One way to workaround this is to use the RESTRICT statement. You can force coefficients >=0 with it.

RESTRICT statment is not available in PROC GLM, but other regression procedures have it. PROC REG fore example.

 

Though this does not solve the problem you describe.

I you have individual variances it sound more like a random effects model.

Still, depending on the data you can get negative parameter estimates. Or even if all parameter estimates are positive, they have an associated variance, so dheir distribution includes negative values.

I would investigate if the model is rather multiplicative (when effects are restricted to be positive it is always suspicious that the model is multiplicative), then you can simply first take the logarithm of the target variable and then model. In that case maybe variances of individuals become similar, and then you can assume one common variance for the sum.

 

With 1600 observations (in this context 1 observation is when you observe the aggregated result) and 200 variables you will need to estimate 2x200 parameters (mean and variance for each individual). Sounds possible, but... good luck 🙂

 

Instead of my fabricated data I simulate data in the following example. Its an additive model. The task is to get back the the original means (40, 100, 15, 300).

 

data have;
array indiv[4] a b c d;
array means[4] _temporary_ (40, 100, 15, 300);
array varia[4] _temporary_ (5, 5, 10, 20);
do obs=1 to 1000;
sum=0;
do var=1 to 4;
ind_effect=means[var]+varia[var]*rannor(0);
indiv[var]=ranbin(0,1,0.5); /*0.5 chance that effect is included*/
sum+(indiv[var]*ind_effect);
end;
output;
end;
drop obs var ind_effect;
run;
proc mixed data=have ;
model sum=a b c d / noint solution ;
random a b c d / type=vc ;
run;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1288 views
  • 2 likes
  • 2 in conversation