BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
zihdonv19
Obsidian | Level 7

The "visit_date" is sorted by "id". The conditions are: To keep all the rows that "weight" is not missing. And, If for the same id, all the "weight" are missing, then only keep the first row of this id.

What I have:

data have;
   input id 	weight 	visit_date :yymmdd10.;
   format visit_date yymmdd10.;
datalines;
1 	55 	2000-01-01
1 	76 	2005-01-02
1 	80 	2006-01-03
1 	62 	2007-01-04
1 	75 	2010-01-05
2 	. 	2000-01-06
2 	. 	2000-01-07
2 	. 	2000-01-08
3 	57 	2001-01-09
3 	. 	2006-01-10
3 	69 	2007-01-11
3 	. 	2008-01-12
;

What I want:

id weight visit_date
1 55 2000-01-01
1 76 2005-01-02
1 80 2006-01-03
1 62 2007-01-04
1 75 2010-01-05
2 . 2000-01-06
3 57 2001-01-09
3 69 2007-01-11
1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

@zihdonv19 wrote:
thank you for pointing out. Will use code next time!

The conditions are: To keep all the rows that "weight" is not missing. And, If for the same id, all the "weight" are missing, then only keep the first row of this id.

One way: The Proc SQL step selects only one record from those ID that have all weight values missing.

Then merge that with the Have data. The data step option following the Have removes any records with missing weight. So combined with the Miss set you get the two pieces.

proc sql;
   create table miss as
   select distinct id, weight, min(visit_date) as visit_date format=yymmdd10.
   from have
   group by id
   having sum(weight)= .
   ;
quit;
/* if the HAVE set is not actually sorted by ID visit_date
   should do so prior to this step
*/
data want;
   merge have (where=(not missing(weight)))
         miss
   ;
   by id;
run;

View solution in original post

9 REPLIES 9
ballardw
Super User

Please actually state the "condition(s)" mentioned in your subject lines. A picture is not a rule that is programmable.

 

I can make several methods that would work for exactly the data shown but would likely not work for any added records, other values of Id or other dates.

 

Please provide example data in the form of data step code so we don't have to guess as to variable types or actual values. Then paste it into a text box. Then we can copy the code and create data to test code with.

With this set it isn't hard.

data have;
   input id 	weight 	visit_date :yymmdd10.;
   format visit_date yymmdd10.;
datalines;
1 	55 	2000-01-01
1 	76 	2005-01-02
1 	80 	2006-01-03
1 	62 	2007-01-04
1 	75 	2010-01-05
2 	. 	2000-01-06
2 	. 	2000-01-07
2 	. 	2000-01-08
3 	57 	2001-01-09
3 	. 	2006-01-10
3 	69 	2007-01-11
3 	. 	2008-01-12
;
zihdonv19
Obsidian | Level 7
thank you for pointing out. Will use code next time!

The conditions are: To keep all the rows that "weight" is not missing. And, If for the same id, all the "weight" are missing, then only keep the first row of this id.
ballardw
Super User

@zihdonv19 wrote:
thank you for pointing out. Will use code next time!

The conditions are: To keep all the rows that "weight" is not missing. And, If for the same id, all the "weight" are missing, then only keep the first row of this id.

One way: The Proc SQL step selects only one record from those ID that have all weight values missing.

Then merge that with the Have data. The data step option following the Have removes any records with missing weight. So combined with the Miss set you get the two pieces.

proc sql;
   create table miss as
   select distinct id, weight, min(visit_date) as visit_date format=yymmdd10.
   from have
   group by id
   having sum(weight)= .
   ;
quit;
/* if the HAVE set is not actually sorted by ID visit_date
   should do so prior to this step
*/
data want;
   merge have (where=(not missing(weight)))
         miss
   ;
   by id;
run;
SaSoDude
Obsidian | Level 7

Looking at your solution, it looks more logically than mine.
But I would like to share my solution anyway:

 

DATA want;
	SET HAVE;
	BY id;
	IF FIRST.id = 1 AND weight          EQ "." THEN OUTPUT;
	IF weight 			NOT EQ "." THEN OUTPUT;
RUN;
ballardw
Super User

@SaSoDude wrote:

Looking at your solution, it looks more logically than mine.
But I would like to share my solution anyway:

 

DATA want;
	SET HAVE;
	BY id;
	IF FIRST.id = 1 AND weight          EQ "." THEN OUTPUT;
	IF weight 			NOT EQ "." THEN OUTPUT;
RUN;

Actually that is very close to my first thought when asking for clarification. The bit that is mainly different is for an example data, that may not actually occur, where the first date is missing a weight but one or more subsequent aren't.

Incomplete example data might not provide that case such as

data have;
   input id 	weight 	visit_date :yymmdd10.;
   format visit_date yymmdd10.;
datalines;
1 	55 	2000-01-01
1 	76 	2005-01-02
1 	80 	2006-01-03
1 	62 	2007-01-04
1 	75 	2010-01-05
2 	. 	2000-01-06
2 	. 	2000-01-07
2 	. 	2000-01-08
3 	57 	2001-01-09
3 	. 	2006-01-10
3 	69 	2007-01-11
3 	. 	2008-01-12
4  .   2001-01-01
4  77   2001-01-02
4  .   2001-01-03
;

 

You may want to learn the function MISSING instead of using = . or ='' or = '.' which will generate conversion to numeric warnings in this case. MISSING is one of the functions that will work with both numeric and character values so it avoids assumptions of variable data type.

SaSoDude
Obsidian | Level 7
Thank you for your hint!
SaSoDude
Obsidian | Level 7
What is your condition that caused you to cross out the dates and want to delete them?
zihdonv19
Obsidian | Level 7

Thank you @ballardw and @SaSoDude for your solutions and discussion! I learned a lot!

 

Here I also want to share my solution that I came up with just now, which is kinda redundant, but seems also work:

data have;
   input id 	weight 	visit_date :yymmdd10.;
   format visit_date yymmdd10.;
datalines;
1 	55 	2000-01-01
1 	76 	2005-01-02
1 	80 	2006-01-03
1 	62 	2007-01-04
1 	75 	2010-01-05
2 	. 	2000-01-06
2 	. 	2000-01-07
2 	. 	2000-01-08
3 	57 	2001-01-09
3 	. 	2006-01-10
3 	69 	2007-01-11
3 	. 	2008-01-12
;

data temp;
set have;
weight1=weight;
if weight1=. then weight1=9999;
keep_=0;
if weight1^=9999 then keep_=1;
run;

proc sql;
create table temp1 as
  select *,
            range(weight1)=0 as constant 
  from temp
  group by id;
quit;


data temp2;
set temp1;
if constant=1 and weight1=9999 then keep_=2;
run;

data temp3;
set temp2;
if keep_=1 or keep_=2;
run;

data want;
set temp3;
by id;
if (first.id and keep_=2) or keep_=1 then output;
drop keep_ constant;
run;
Ksharp
Super User
data have;
   input id 	weight 	visit_date :yymmdd10.;
   format visit_date yymmdd10.;
datalines;
1 	55 	2000-01-01
1 	76 	2005-01-02
1 	80 	2006-01-03
1 	62 	2007-01-04
1 	75 	2010-01-05
2 	. 	2000-01-06
2 	. 	2000-01-07
2 	. 	2000-01-08
3 	57 	2001-01-09
3 	. 	2006-01-10
3 	69 	2007-01-11
3 	. 	2008-01-12
4  .   2001-01-01
4  77   2001-01-02
4  .   2001-01-03
;

proc sql;
create table want as
select * from have where weight is not missing
union
select distinct * from have group by id having n(weight)=0 and visit_date=min(visit_date)
order by 1,3
;
quit;

SAS INNOVATE 2024

Innovate_SAS_Blue.png

Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Get the $99 certification deal.jpg

 

 

Back in the Classroom!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 375 views
  • 3 likes
  • 4 in conversation