- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I have a dataset with multiple subjects . One usubjid can have two records with different subjid. i want to filter the dataset with following rules
- Keep all single records (only one record for that
usubjid
) automatically. - For groups (within
usubjid
) with exactly two records where one record has a non-missingactarm
and the other is missing, keep only the record with the non-missingactarm
. - If both records have non-missing
actarm
, keep both. - If both records have missing
actarm
, keep both.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You want a code suggestion. So help us help you. Please provide sample data in the form of a DATA step, and show the corresponding desired results.
BTW, what have you tried so far?
Oh yes, and can your dataset have a USUBJID with MORE THAN 2 observations?
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
One way is processing data in BY/GROUP and RETAIN statement where first.variable reads the first record in the group (USUBJID here) and the last.variable reads the second/last record, retain statement carries over the first record to the next record to enable comparison.
data have;
infile datalines truncover;
input usubjid $ actarm $;
datalines;
us222 A
us222
us111 B
us333 C
us333 C
us444
us444
;
run;
proc sort data=have;
by usubjid descending actarm;
run;
data want;
set have;
by usubjid descending actarm;
retain _actarm;
if first.usubjid then _actarm=actarm;
if last.usubjid and _actarm^=actarm then delete;
drop _:;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You really have to seriously expand on exactly what this means:
One usubjid can have two records with different subjid.
As in provide examples, how if there are different values of "subjid" how we can tell from the data that they are supposed to be treated as one. You do not mention any rules for "subjid". How does that variable interact with Actarm???
@abhisas1 wrote:
I have a dataset with multiple subjects . One usubjid can have two records with different subjid. i want to filter the dataset with following rules
- Keep all single records (only one record for that
usubjid
) automatically.- For groups (within
usubjid
) with exactly two records where one record has a non-missingactarm
and the other is missing, keep only the record with the non-missingactarm
.- If both records have non-missing
actarm
, keep both.- If both records have missing
actarm
, keep both.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
What if you have more than two records for one usubjectid, what you gotta do ?
data have;
infile datalines truncover;
input usubjid $ actarm $;
datalines;
us222 A
us222
us111 B
us333 C
us333 C
us444
us444
;
run;
proc sql;
create table want as
select *
from have
group by usubjid
having actarm=max(actarm);
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
In addition to the questions others have asked it is probably worth considering how you got into this situation to begin with.
Why is it that some of the observations have different values of ACTARM?
Could you go back a step and fix the process that made this dataset so that ACTARM is populated correctly to begin with?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I liked @A_Kh's suggestion to sort by USUBJID descending ACTARM, which is intended to put missing ACTARM obs last within a USUBJID. But without a minor edit, it would not accommodate two distinct valid values for ACTARM (if a USUBJID had a valid "B" and a valid "C", the "B" would be deleted).
Consider this alternative, applied to data HAVE sorted by USUBJID (but not necessarily by ACTARM within USUBJID):
data want;
set have (where=(missing(actarm)=0) in=valid)
have (where=(missing(actarm)) in=miss);
by usubjid;
if lag(valid)=1 and first.usubjid=0 and miss=1 then delete;
run;
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------