The "visit_date" is sorted by "id". The conditions are: To keep all the rows that "weight" is not missing. And, If for the same id, all the "weight" are missing, then only keep the first row of this id.
What I have:
data have; input id weight visit_date :yymmdd10.; format visit_date yymmdd10.; datalines; 1 55 2000-01-01 1 76 2005-01-02 1 80 2006-01-03 1 62 2007-01-04 1 75 2010-01-05 2 . 2000-01-06 2 . 2000-01-07 2 . 2000-01-08 3 57 2001-01-09 3 . 2006-01-10 3 69 2007-01-11 3 . 2008-01-12 ;
What I want:
id | weight | visit_date |
1 | 55 | 2000-01-01 |
1 | 76 | 2005-01-02 |
1 | 80 | 2006-01-03 |
1 | 62 | 2007-01-04 |
1 | 75 | 2010-01-05 |
2 | . | 2000-01-06 |
3 | 57 | 2001-01-09 |
3 | 69 | 2007-01-11 |
@zihdonv19 wrote:
thank you for pointing out. Will use code next time!
The conditions are: To keep all the rows that "weight" is not missing. And, If for the same id, all the "weight" are missing, then only keep the first row of this id.
One way: The Proc SQL step selects only one record from those ID that have all weight values missing.
Then merge that with the Have data. The data step option following the Have removes any records with missing weight. So combined with the Miss set you get the two pieces.
proc sql; create table miss as select distinct id, weight, min(visit_date) as visit_date format=yymmdd10. from have group by id having sum(weight)= . ; quit; /* if the HAVE set is not actually sorted by ID visit_date should do so prior to this step */ data want; merge have (where=(not missing(weight))) miss ; by id; run;
Please actually state the "condition(s)" mentioned in your subject lines. A picture is not a rule that is programmable.
I can make several methods that would work for exactly the data shown but would likely not work for any added records, other values of Id or other dates.
Please provide example data in the form of data step code so we don't have to guess as to variable types or actual values. Then paste it into a text box. Then we can copy the code and create data to test code with.
With this set it isn't hard.
data have; input id weight visit_date :yymmdd10.; format visit_date yymmdd10.; datalines; 1 55 2000-01-01 1 76 2005-01-02 1 80 2006-01-03 1 62 2007-01-04 1 75 2010-01-05 2 . 2000-01-06 2 . 2000-01-07 2 . 2000-01-08 3 57 2001-01-09 3 . 2006-01-10 3 69 2007-01-11 3 . 2008-01-12 ;
@zihdonv19 wrote:
thank you for pointing out. Will use code next time!
The conditions are: To keep all the rows that "weight" is not missing. And, If for the same id, all the "weight" are missing, then only keep the first row of this id.
One way: The Proc SQL step selects only one record from those ID that have all weight values missing.
Then merge that with the Have data. The data step option following the Have removes any records with missing weight. So combined with the Miss set you get the two pieces.
proc sql; create table miss as select distinct id, weight, min(visit_date) as visit_date format=yymmdd10. from have group by id having sum(weight)= . ; quit; /* if the HAVE set is not actually sorted by ID visit_date should do so prior to this step */ data want; merge have (where=(not missing(weight))) miss ; by id; run;
Looking at your solution, it looks more logically than mine.
But I would like to share my solution anyway:
DATA want;
SET HAVE;
BY id;
IF FIRST.id = 1 AND weight EQ "." THEN OUTPUT;
IF weight NOT EQ "." THEN OUTPUT;
RUN;
@SaSoDude wrote:
Looking at your solution, it looks more logically than mine.
But I would like to share my solution anyway:
DATA want; SET HAVE; BY id; IF FIRST.id = 1 AND weight EQ "." THEN OUTPUT; IF weight NOT EQ "." THEN OUTPUT; RUN;
Actually that is very close to my first thought when asking for clarification. The bit that is mainly different is for an example data, that may not actually occur, where the first date is missing a weight but one or more subsequent aren't.
Incomplete example data might not provide that case such as
data have; input id weight visit_date :yymmdd10.; format visit_date yymmdd10.; datalines; 1 55 2000-01-01 1 76 2005-01-02 1 80 2006-01-03 1 62 2007-01-04 1 75 2010-01-05 2 . 2000-01-06 2 . 2000-01-07 2 . 2000-01-08 3 57 2001-01-09 3 . 2006-01-10 3 69 2007-01-11 3 . 2008-01-12 4 . 2001-01-01 4 77 2001-01-02 4 . 2001-01-03 ;
You may want to learn the function MISSING instead of using = . or ='' or = '.' which will generate conversion to numeric warnings in this case. MISSING is one of the functions that will work with both numeric and character values so it avoids assumptions of variable data type.
Thank you @ballardw and @SaSoDude for your solutions and discussion! I learned a lot!
Here I also want to share my solution that I came up with just now, which is kinda redundant, but seems also work:
data have;
input id weight visit_date :yymmdd10.;
format visit_date yymmdd10.;
datalines;
1 55 2000-01-01
1 76 2005-01-02
1 80 2006-01-03
1 62 2007-01-04
1 75 2010-01-05
2 . 2000-01-06
2 . 2000-01-07
2 . 2000-01-08
3 57 2001-01-09
3 . 2006-01-10
3 69 2007-01-11
3 . 2008-01-12
;
data temp;
set have;
weight1=weight;
if weight1=. then weight1=9999;
keep_=0;
if weight1^=9999 then keep_=1;
run;
proc sql;
create table temp1 as
select *,
range(weight1)=0 as constant
from temp
group by id;
quit;
data temp2;
set temp1;
if constant=1 and weight1=9999 then keep_=2;
run;
data temp3;
set temp2;
if keep_=1 or keep_=2;
run;
data want;
set temp3;
by id;
if (first.id and keep_=2) or keep_=1 then output;
drop keep_ constant;
run;
data have;
input id weight visit_date :yymmdd10.;
format visit_date yymmdd10.;
datalines;
1 55 2000-01-01
1 76 2005-01-02
1 80 2006-01-03
1 62 2007-01-04
1 75 2010-01-05
2 . 2000-01-06
2 . 2000-01-07
2 . 2000-01-08
3 57 2001-01-09
3 . 2006-01-10
3 69 2007-01-11
3 . 2008-01-12
4 . 2001-01-01
4 77 2001-01-02
4 . 2001-01-03
;
proc sql;
create table want as
select * from have where weight is not missing
union
select distinct * from have group by id having n(weight)=0 and visit_date=min(visit_date)
order by 1,3
;
quit;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.