Re: How to Delete Entire Row if Conditions Are Met

unwashedhelimix · Posted 10-21-2024 03:51 AM

I have a concatenated data set from 2 data sets that looks something like this:

ID     Gender   Starting_Weight    End_Weight

1        M            170                       180
2        M            182                       175
3        M            155                       162
4        M            203                       196
5        F            124                       135
6        F            152                       155
7        F            118                       133
8        F            160                       152

There are more observations, but what I want to do is remove all rows where the change between the weights is less than 10. Here is my atttempt:

data WeightsCombined;
	set WeightsMale WeightsFemale;
	Weight_change = sum(-Starting_Weight, End_Weight);
	if Weight_change<10 then delete;
run;

proc print data = WeightsCombined;
run;

For some reason, this outputs nothing. Why is an output not showing and how can I remove the rows that meet my condition?

Patrick · Posted 10-21-2024 04:09 AM

Your logic should at least delete some rows from the sample data you've provided.

If it's just about change then I'd be using the abs() function. To not mask cases with missings I wouldn't use the sum() function.

data weight_data;
  input ID Gender $ Starting_Weight End_Weight;
  del_flg=  0<=abs(Starting_Weight-End_Weight)<10;
/*  if 0<=abs(Starting_Weight-End_Weight)<10 then delete;*/
  datalines;
1 M 170 180
2 M 182 175
3 M 155 162
4 M 203 196
5 F 124 135
6 F 152 155
7 F 118 133
8 F 160 152
9 F . 152
;
run;

proc print data=weight_data;
run;

A_Kh · Posted 10-21-2024 08:10 AM

Another approach using ABS function. If we need to read only observations where change is greater than or equals 10 then where statement is more efficient in terms of system processing.

data want;
	set have;
	where abs(starting_weight - end_weight) ge 10;
run;

Tom · Posted 10-21-2024 07:59 AM

Your SET statement looks wrong to me.

set WeightsMale WeightsFemale;

Do you really have the data in two separate datasets? Do they have the same variables?

Why use the SUM() function? That will yield invalid results when only one of the readings is missing.

Let's save the intermediate calculations into variables.

data have;
  input ID Gender :$1. Starting_Weight End_Weight;
cards;
1 M 170 180
2 M 182 175
3 M 155 162
4 M 203 196
5 F 124 135
6 F 152 155
7 F 118 133
8 F 160 152
;

data want;
  set have;
  weight_gain = end_weight - starting_weight;
  lessthan10 = weight_gain < 10;
run;

Here is the result for your example:

Which observations do you want to delete? Your current logic would eliminate all but 3 of the observations, the ones where LESSTHAN10 variable has value 0.

How to Delete Entire Row if Conditions Are Met