Solved: Re: Delete rows with condition

zihdonv19 · Posted 08-15-2023 12:18 PM

The "visit_date" is sorted by "id". The conditions are: To keep all the rows that "weight" is not missing. And, If for the same id, all the "weight" are missing, then only keep the first row of this id.

What I have:

data have;
   input id 	weight 	visit_date :yymmdd10.;
   format visit_date yymmdd10.;
datalines;
1 	55 	2000-01-01
1 	76 	2005-01-02
1 	80 	2006-01-03
1 	62 	2007-01-04
1 	75 	2010-01-05
2 	. 	2000-01-06
2 	. 	2000-01-07
2 	. 	2000-01-08
3 	57 	2001-01-09
3 	. 	2006-01-10
3 	69 	2007-01-11
3 	. 	2008-01-12
;

What I want:

id	weight	visit_date
1	55	2000-01-01
1	76	2005-01-02
1	80	2006-01-03
1	62	2007-01-04
1	75	2010-01-05
2	.	2000-01-06
3	57	2001-01-09
3	69	2007-01-11

ballardw · Posted 08-15-2023 05:17 PM

@zihdonv19 wrote:
thank you for pointing out. Will use code next time!

The conditions are: To keep all the rows that "weight" is not missing. And, If for the same id, all the "weight" are missing, then only keep the first row of this id.

One way: The Proc SQL step selects only one record from those ID that have all weight values missing.

Then merge that with the Have data. The data step option following the Have removes any records with missing weight. So combined with the Miss set you get the two pieces.

proc sql;
   create table miss as
   select distinct id, weight, min(visit_date) as visit_date format=yymmdd10.
   from have
   group by id
   having sum(weight)= .
   ;
quit;
/* if the HAVE set is not actually sorted by ID visit_date
   should do so prior to this step
*/
data want;
   merge have (where=(not missing(weight)))
         miss
   ;
   by id;
run;

View solution in original post

ballardw · Posted 08-15-2023 12:44 PM

Please actually state the "condition(s)" mentioned in your subject lines. A picture is not a rule that is programmable.

I can make several methods that would work for exactly the data shown but would likely not work for any added records, other values of Id or other dates.

Please provide example data in the form of data step code so we don't have to guess as to variable types or actual values. Then paste it into a text box. Then we can copy the code and create data to test code with.

With this set it isn't hard.

data have;
   input id 	weight 	visit_date :yymmdd10.;
   format visit_date yymmdd10.;
datalines;
1 	55 	2000-01-01
1 	76 	2005-01-02
1 	80 	2006-01-03
1 	62 	2007-01-04
1 	75 	2010-01-05
2 	. 	2000-01-06
2 	. 	2000-01-07
2 	. 	2000-01-08
3 	57 	2001-01-09
3 	. 	2006-01-10
3 	69 	2007-01-11
3 	. 	2008-01-12
;

zihdonv19 · Posted 08-15-2023 12:54 PM

thank you for pointing out. Will use code next time!

The conditions are: To keep all the rows that "weight" is not missing. And, If for the same id, all the "weight" are missing, then only keep the first row of this id.

ballardw · Posted 08-15-2023 05:17 PM

@zihdonv19 wrote:
thank you for pointing out. Will use code next time!

The conditions are: To keep all the rows that "weight" is not missing. And, If for the same id, all the "weight" are missing, then only keep the first row of this id.

One way: The Proc SQL step selects only one record from those ID that have all weight values missing.

Then merge that with the Have data. The data step option following the Have removes any records with missing weight. So combined with the Miss set you get the two pieces.

proc sql;
   create table miss as
   select distinct id, weight, min(visit_date) as visit_date format=yymmdd10.
   from have
   group by id
   having sum(weight)= .
   ;
quit;
/* if the HAVE set is not actually sorted by ID visit_date
   should do so prior to this step
*/
data want;
   merge have (where=(not missing(weight)))
         miss
   ;
   by id;
run;

SaSoDude · Posted 08-15-2023 05:44 PM

Looking at your solution, it looks more logically than mine.
But I would like to share my solution anyway:

DATA want;
	SET HAVE;
	BY id;
	IF FIRST.id = 1 AND weight          EQ "." THEN OUTPUT;
	IF weight 			NOT EQ "." THEN OUTPUT;
RUN;

ballardw · Posted 08-15-2023 07:00 PM

@SaSoDude wrote:

Looking at your solution, it looks more logically than mine.
But I would like to share my solution anyway:
DATA want;
	SET HAVE;
	BY id;
	IF FIRST.id = 1 AND weight          EQ "." THEN OUTPUT;
	IF weight 			NOT EQ "." THEN OUTPUT;
RUN;

Actually that is very close to my first thought when asking for clarification. The bit that is mainly different is for an example data, that may not actually occur, where the first date is missing a weight but one or more subsequent aren't.

Incomplete example data might not provide that case such as

data have;
   input id 	weight 	visit_date :yymmdd10.;
   format visit_date yymmdd10.;
datalines;
1 	55 	2000-01-01
1 	76 	2005-01-02
1 	80 	2006-01-03
1 	62 	2007-01-04
1 	75 	2010-01-05
2 	. 	2000-01-06
2 	. 	2000-01-07
2 	. 	2000-01-08
3 	57 	2001-01-09
3 	. 	2006-01-10
3 	69 	2007-01-11
3 	. 	2008-01-12
4  .   2001-01-01
4  77   2001-01-02
4  .   2001-01-03
;

You may want to learn the function MISSING instead of using = . or ='' or = '.' which will generate conversion to numeric warnings in this case. MISSING is one of the functions that will work with both numeric and character values so it avoids assumptions of variable data type.

SaSoDude · Posted 08-16-2023 03:34 AM

Thank you for your hint!

SaSoDude · Posted 08-15-2023 12:46 PM

What is your condition that caused you to cross out the dates and want to delete them?

zihdonv19 · Posted 08-16-2023 06:25 AM

Thank you @ballardw and @SaSoDude for your solutions and discussion! I learned a lot!

Here I also want to share my solution that I came up with just now, which is kinda redundant, but seems also work:

data have;
   input id 	weight 	visit_date :yymmdd10.;
   format visit_date yymmdd10.;
datalines;
1 	55 	2000-01-01
1 	76 	2005-01-02
1 	80 	2006-01-03
1 	62 	2007-01-04
1 	75 	2010-01-05
2 	. 	2000-01-06
2 	. 	2000-01-07
2 	. 	2000-01-08
3 	57 	2001-01-09
3 	. 	2006-01-10
3 	69 	2007-01-11
3 	. 	2008-01-12
;

data temp;
set have;
weight1=weight;
if weight1=. then weight1=9999;
keep_=0;
if weight1^=9999 then keep_=1;
run;

proc sql;
create table temp1 as
  select *,
            range(weight1)=0 as constant 
  from temp
  group by id;
quit;


data temp2;
set temp1;
if constant=1 and weight1=9999 then keep_=2;
run;

data temp3;
set temp2;
if keep_=1 or keep_=2;
run;

data want;
set temp3;
by id;
if (first.id and keep_=2) or keep_=1 then output;
drop keep_ constant;
run;

Ksharp · Posted 08-16-2023 08:03 AM

data have;
   input id 	weight 	visit_date :yymmdd10.;
   format visit_date yymmdd10.;
datalines;
1 	55 	2000-01-01
1 	76 	2005-01-02
1 	80 	2006-01-03
1 	62 	2007-01-04
1 	75 	2010-01-05
2 	. 	2000-01-06
2 	. 	2000-01-07
2 	. 	2000-01-08
3 	57 	2001-01-09
3 	. 	2006-01-10
3 	69 	2007-01-11
3 	. 	2008-01-12
4  .   2001-01-01
4  77   2001-01-02
4  .   2001-01-03
;

proc sql;
create table want as
select * from have where weight is not missing
union
select distinct * from have group by id having n(weight)=0 and visit_date=min(visit_date)
order by 1,3
;
quit;

Ready to join fellow brilliant minds for the SAS Hackathon?

Classroom Training Available!