Solved: Selecting one observation in a group and subtracting that observation ...

SMcelroy1287 · Posted 05-17-2017 02:15 PM

Hello! Thank you fro your help in advance. I have a dataset containing the outcome variable 1=case and 0=control, group_id= the matching controls for each case and the case, propensity scores for every case and control. I would like to select the propensity score of the case and use this value to generate a new variable that is the difference between the case's score and all the control's propensity scores in the same group.

Outcome Group_id Propensity score New Variable (control propensity score-case propensity score)

1 1 .2378 0

0 1 .2637 (.2637-.2378)

0 1 .2987 (.2987-.2378)

0 1 .2309 (.2309-.2378)

0 1 .2134 (.2134-.2378)

0 2 .0023 (.0023-.0324)

0 2 .0123 (.0123-.0324)

0 2 .0224 (.0224-.0324)

1 2 .0324 0

0 2 .0128 (.0128-.0324)

I have about 45,000 groups I need to calculate this difference for. Thank you very much for your time!

Astounding · Posted 05-17-2017 02:40 PM

While there are a few ways, this is probably the most likely to work without hiding potential error situations:

data want;

do until (last.group_id);

set have;

by group_id;

if outcome=1 then case_propensity = propensity_score;

end;

do until (last.group_id);

set have;

by group_id;

new_variable = propensity_score = case_propensity;

output;

end;

run;

Assuming your data set is sorted by GROUP_ID, the top loop finds the CASE observation for a GROUP_ID. Then the bottom loop reads the same observations, calculates, and outputs.

View solution in original post

art297 · Posted 05-17-2017 02:39 PM

proc sort data=have out=want;
  by Group_id descending Outcome;
run;

data want (drop=hold);
  set want;
  by Group_id;
  retain hold;
  if first.Group_id then hold=Propensity_score;
  new_variable=Propensity_score-hold;
run;

Art, CEO, AnalystFinder.com

SMcelroy1287 · Posted 05-17-2017 03:31 PM

Thank you for taking the time to respond!

Astounding · Posted 05-17-2017 02:40 PM

While there are a few ways, this is probably the most likely to work without hiding potential error situations:

data want;

do until (last.group_id);

set have;

by group_id;

if outcome=1 then case_propensity = propensity_score;

end;

do until (last.group_id);

set have;

by group_id;

new_variable = propensity_score = case_propensity;

output;

end;

run;

Assuming your data set is sorted by GROUP_ID, the top loop finds the CASE observation for a GROUP_ID. Then the bottom loop reads the same observations, calculates, and outputs.

SMcelroy1287 · Posted 05-17-2017 03:31 PM

Thank you for the response! This worked!

Selecting one observation in a group and subtracting that observation from all other observations

Re: Selecting one observation in a group and subtracting that observation from all other observation

Re: Selecting one observation in a group and subtracting that observation from all other observation

Re: Selecting one observation in a group and subtracting that observation from all other observation

Re: Selecting one observation in a group and subtracting that observation from all other observation

Re: Selecting one observation in a group and subtracting that observation from all other observation

Selecting one observation in a group and subtracting that observation from all other observations

Re: Selecting one observation in a group and subtracting that observation from all other observation

Re: Selecting one observation in a group and subtracting that observation from all other observation

Re: Selecting one observation in a group and subtracting that observation from all other observation

Re: Selecting one observation in a group and subtracting that observation from all other observation

Re: Selecting one observation in a group and subtracting that observation from all other observation

Click image to register for webinar

Classroom Training Available!