BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
zihdonv19
Obsidian | Level 7

The "visit_date" is sorted by "id". The conditions are: To keep all the rows that "weight" is not missing. And, If for the same id, all the "weight" are missing, then only keep the first row of this id.

What I have:

data have;
   input id 	weight 	visit_date :yymmdd10.;
   format visit_date yymmdd10.;
datalines;
1 	55 	2000-01-01
1 	76 	2005-01-02
1 	80 	2006-01-03
1 	62 	2007-01-04
1 	75 	2010-01-05
2 	. 	2000-01-06
2 	. 	2000-01-07
2 	. 	2000-01-08
3 	57 	2001-01-09
3 	. 	2006-01-10
3 	69 	2007-01-11
3 	. 	2008-01-12
;

What I want:

id weight visit_date
1 55 2000-01-01
1 76 2005-01-02
1 80 2006-01-03
1 62 2007-01-04
1 75 2010-01-05
2 . 2000-01-06
3 57 2001-01-09
3 69 2007-01-11
1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

@zihdonv19 wrote:
thank you for pointing out. Will use code next time!

The conditions are: To keep all the rows that "weight" is not missing. And, If for the same id, all the "weight" are missing, then only keep the first row of this id.

One way: The Proc SQL step selects only one record from those ID that have all weight values missing.

Then merge that with the Have data. The data step option following the Have removes any records with missing weight. So combined with the Miss set you get the two pieces.

proc sql;
   create table miss as
   select distinct id, weight, min(visit_date) as visit_date format=yymmdd10.
   from have
   group by id
   having sum(weight)= .
   ;
quit;
/* if the HAVE set is not actually sorted by ID visit_date
   should do so prior to this step
*/
data want;
   merge have (where=(not missing(weight)))
         miss
   ;
   by id;
run;

View solution in original post

9 REPLIES 9
ballardw
Super User

Please actually state the "condition(s)" mentioned in your subject lines. A picture is not a rule that is programmable.

 

I can make several methods that would work for exactly the data shown but would likely not work for any added records, other values of Id or other dates.

 

Please provide example data in the form of data step code so we don't have to guess as to variable types or actual values. Then paste it into a text box. Then we can copy the code and create data to test code with.

With this set it isn't hard.

data have;
   input id 	weight 	visit_date :yymmdd10.;
   format visit_date yymmdd10.;
datalines;
1 	55 	2000-01-01
1 	76 	2005-01-02
1 	80 	2006-01-03
1 	62 	2007-01-04
1 	75 	2010-01-05
2 	. 	2000-01-06
2 	. 	2000-01-07
2 	. 	2000-01-08
3 	57 	2001-01-09
3 	. 	2006-01-10
3 	69 	2007-01-11
3 	. 	2008-01-12
;
zihdonv19
Obsidian | Level 7
thank you for pointing out. Will use code next time!

The conditions are: To keep all the rows that "weight" is not missing. And, If for the same id, all the "weight" are missing, then only keep the first row of this id.
ballardw
Super User

@zihdonv19 wrote:
thank you for pointing out. Will use code next time!

The conditions are: To keep all the rows that "weight" is not missing. And, If for the same id, all the "weight" are missing, then only keep the first row of this id.

One way: The Proc SQL step selects only one record from those ID that have all weight values missing.

Then merge that with the Have data. The data step option following the Have removes any records with missing weight. So combined with the Miss set you get the two pieces.

proc sql;
   create table miss as
   select distinct id, weight, min(visit_date) as visit_date format=yymmdd10.
   from have
   group by id
   having sum(weight)= .
   ;
quit;
/* if the HAVE set is not actually sorted by ID visit_date
   should do so prior to this step
*/
data want;
   merge have (where=(not missing(weight)))
         miss
   ;
   by id;
run;
SaSoDude
Obsidian | Level 7

Looking at your solution, it looks more logically than mine.
But I would like to share my solution anyway:

 

DATA want;
	SET HAVE;
	BY id;
	IF FIRST.id = 1 AND weight          EQ "." THEN OUTPUT;
	IF weight 			NOT EQ "." THEN OUTPUT;
RUN;
ballardw
Super User

@SaSoDude wrote:

Looking at your solution, it looks more logically than mine.
But I would like to share my solution anyway:

 

DATA want;
	SET HAVE;
	BY id;
	IF FIRST.id = 1 AND weight          EQ "." THEN OUTPUT;
	IF weight 			NOT EQ "." THEN OUTPUT;
RUN;

Actually that is very close to my first thought when asking for clarification. The bit that is mainly different is for an example data, that may not actually occur, where the first date is missing a weight but one or more subsequent aren't.

Incomplete example data might not provide that case such as

data have;
   input id 	weight 	visit_date :yymmdd10.;
   format visit_date yymmdd10.;
datalines;
1 	55 	2000-01-01
1 	76 	2005-01-02
1 	80 	2006-01-03
1 	62 	2007-01-04
1 	75 	2010-01-05
2 	. 	2000-01-06
2 	. 	2000-01-07
2 	. 	2000-01-08
3 	57 	2001-01-09
3 	. 	2006-01-10
3 	69 	2007-01-11
3 	. 	2008-01-12
4  .   2001-01-01
4  77   2001-01-02
4  .   2001-01-03
;

 

You may want to learn the function MISSING instead of using = . or ='' or = '.' which will generate conversion to numeric warnings in this case. MISSING is one of the functions that will work with both numeric and character values so it avoids assumptions of variable data type.

SaSoDude
Obsidian | Level 7
Thank you for your hint!
SaSoDude
Obsidian | Level 7
What is your condition that caused you to cross out the dates and want to delete them?
zihdonv19
Obsidian | Level 7

Thank you @ballardw and @SaSoDude for your solutions and discussion! I learned a lot!

 

Here I also want to share my solution that I came up with just now, which is kinda redundant, but seems also work:

data have;
   input id 	weight 	visit_date :yymmdd10.;
   format visit_date yymmdd10.;
datalines;
1 	55 	2000-01-01
1 	76 	2005-01-02
1 	80 	2006-01-03
1 	62 	2007-01-04
1 	75 	2010-01-05
2 	. 	2000-01-06
2 	. 	2000-01-07
2 	. 	2000-01-08
3 	57 	2001-01-09
3 	. 	2006-01-10
3 	69 	2007-01-11
3 	. 	2008-01-12
;

data temp;
set have;
weight1=weight;
if weight1=. then weight1=9999;
keep_=0;
if weight1^=9999 then keep_=1;
run;

proc sql;
create table temp1 as
  select *,
            range(weight1)=0 as constant 
  from temp
  group by id;
quit;


data temp2;
set temp1;
if constant=1 and weight1=9999 then keep_=2;
run;

data temp3;
set temp2;
if keep_=1 or keep_=2;
run;

data want;
set temp3;
by id;
if (first.id and keep_=2) or keep_=1 then output;
drop keep_ constant;
run;
Ksharp
Super User
data have;
   input id 	weight 	visit_date :yymmdd10.;
   format visit_date yymmdd10.;
datalines;
1 	55 	2000-01-01
1 	76 	2005-01-02
1 	80 	2006-01-03
1 	62 	2007-01-04
1 	75 	2010-01-05
2 	. 	2000-01-06
2 	. 	2000-01-07
2 	. 	2000-01-08
3 	57 	2001-01-09
3 	. 	2006-01-10
3 	69 	2007-01-11
3 	. 	2008-01-12
4  .   2001-01-01
4  77   2001-01-02
4  .   2001-01-03
;

proc sql;
create table want as
select * from have where weight is not missing
union
select distinct * from have group by id having n(weight)=0 and visit_date=min(visit_date)
order by 1,3
;
quit;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1338 views
  • 3 likes
  • 4 in conversation