Dear,
I have a data set that i need to remove a few selected obs.
I need to remove when 'ID' and 'VAL' are equal and 'date' value is dot (date value should be available for other obs) and more than one obs present.
eg: From below data set I need to remove second obs(highlighted in red) only because it has same 'ID' and 'VAl' values as in first obs and date is available in first obs and date is null not in second.Please help. Thank you
output needed
id val date
1 sd 2018-05-15
2 pr .
3 cr 2018-05-15
3 pr 2018-05-15
3 sd 2018-05-15
4 sd 2017-05-15
4 sd 2017-04-15
5 pd .
5 pd .
data one;
input id val $ date $10.;
datalines;
1 sd 2018-05-15
1 sd .
2 pr .
3 cr 2018-05-15
3 pr 2018-04-15
3 sd 2018-05-15
4 sd 2017-05-15
4 sd 2017-04-15
5 pd .
5 pd .
;
You have date as a character variable, so you want to deselect certain cases with date=' ', not cases with date=dot.
If you never have more than one case per id with date=' ', then you can merge a subset of dataset ONE with another subset:
data want;
merge one (where=(date=' '))
one (where=(date^=' ')) ;
by id;
run;
For id's with only a single record, that record (whether date=' ' or not) will pass through. For those id's with both a date=' ' and a non-blank date, the non-blank value will prevail, since it will be in the latter argument of the merge statement. I.e. whenever you merge multiple data sets (or multiple subsets in this case) the right-most value of any common variable is the one that is kept when there are matched records.
The UPDATE statement does exactly that (if you want to just keep the latest non-missing data for each VAL ID group.
data HAVE;
input ID VAL $ DATE : $10.;
cards;
1 sd 2018-05-15
2 pr .
3 cr 2018-05-15
3 pr 2018-05-15
3 sd 2018-05-15
4 sd 2017-05-15 <=removed, replaced with next record
4 sd 2017-04-15
5 pd .
5 pd . <=removed, no value to add
run;
data WANT;
update HAVE(obs=0) HAVE;
by ID VAL;
run;
ID | VAL | DATE |
---|---|---|
1 | sd | 2018-05-15 |
2 | pr | |
3 | cr | 2018-05-15 |
3 | pr | 2018-05-15 |
3 | sd | 2018-05-15 |
4 | sd | 2017-04-15 |
5 | pd |
[ Edit: Sorry @Tom, it looks like you should have been just 3 minutes faster 🙂 ]
A classic example of when to use the UPDATE statement. Update is intended to apply transactions to a dataset. In particular missing values are treated as meaning no change should be made.
Normally you need a master and transaction dataset, but you can use an empty version of your source data as the "master".
data want ;
update one(obs=0) one;
by id val;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.