BookmarkSubscribeRSS Feed
knveraraju91
Barite | Level 11

Dear,

I have a data set that i need to remove a few selected obs. 

 

I need to remove when 'ID'  and 'VAL' are equal and 'date'  value is  dot (date value should be available for other obs) and  more than one obs present.

 

eg: From below data set I need to remove second obs(highlighted in red) only because it has same 'ID' and 'VAl' values  as in first obs and date is available in first obs and date is null not in second.Please help. Thank you

 

output needed

 

id val date 
1 sd 2018-05-15
2 pr .
3 cr 2018-05-15
3 pr 2018-05-15
3 sd 2018-05-15
4 sd 2017-05-15
4 sd 2017-04-15
5 pd .
5 pd .

 

data one;
input id val $ date $10.;
datalines;
1 sd 2018-05-15
1 sd .
2 pr .
3 cr 2018-05-15
3 pr 2018-04-15
3 sd 2018-05-15
4 sd 2017-05-15
4 sd 2017-04-15
5 pd .
5 pd . ;
3 REPLIES 3
mkeintz
PROC Star

You have date as a character variable, so you want to deselect certain cases with date=' ', not cases with date=dot.

 

If you never have more than one case per id with date=' ', then you can merge a subset of dataset ONE with another subset:

 

data want;
  merge one (where=(date=' '))
        one (where=(date^=' ')) ;
  by id;
run;

 

For id's with only a single record, that record (whether date=' ' or not) will pass through.  For those id's with both a date=' ' and a non-blank date, the non-blank value will prevail, since it will be in the latter argument of the merge statement.  I.e. whenever you merge multiple data sets (or multiple subsets in this case) the right-most value of any common variable is the one that is kept when there are matched records.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
ChrisNZ
Tourmaline | Level 20

The UPDATE statement does exactly that (if you want to just keep the latest non-missing data for each VAL ID group.

data HAVE;
input ID VAL $ DATE : $10.;
cards;
1 sd 2018-05-15
2 pr .
3 cr 2018-05-15
3 pr 2018-05-15
3 sd 2018-05-15
4 sd 2017-05-15 <=removed, replaced with next record
4 sd 2017-04-15
5 pd .
5 pd .          <=removed, no value to add
run;
data WANT;
  update HAVE(obs=0) HAVE;
  by ID VAL;
run;

 

ID VAL DATE
1 sd 2018-05-15
2 pr  
3 cr 2018-05-15
3 pr 2018-05-15
3 sd 2018-05-15
4 sd 2017-04-15
5 pd  

 

 [  Edit: Sorry @Tom, it looks like you should have been just 3 minutes faster 🙂   ]

Tom
Super User Tom
Super User

A classic example of when to use the UPDATE statement.  Update is intended to apply transactions to a dataset. In particular missing values are treated as meaning no change should be made.

 

Normally you need a master and transaction dataset, but you can use an empty version of your source data as the "master".

data want ;
  update one(obs=0) one;
  by id val;
run;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 700 views
  • 4 likes
  • 4 in conversation