I have a dataset with multiple subjects . One usubjid can have two records with different subjid. i want to filter the dataset with following rules
usubjid
) automatically.usubjid
) with exactly two records where one record has a non-missing actarm
and the other is missing, keep only the record with the non-missing actarm
.actarm
, keep both.actarm
, keep both.
You want a code suggestion. So help us help you. Please provide sample data in the form of a DATA step, and show the corresponding desired results.
BTW, what have you tried so far?
Oh yes, and can your dataset have a USUBJID with MORE THAN 2 observations?
One way is processing data in BY/GROUP and RETAIN statement where first.variable reads the first record in the group (USUBJID here) and the last.variable reads the second/last record, retain statement carries over the first record to the next record to enable comparison.
data have;
infile datalines truncover;
input usubjid $ actarm $;
datalines;
us222 A
us222
us111 B
us333 C
us333 C
us444
us444
;
run;
proc sort data=have;
by usubjid descending actarm;
run;
data want;
set have;
by usubjid descending actarm;
retain _actarm;
if first.usubjid then _actarm=actarm;
if last.usubjid and _actarm^=actarm then delete;
drop _:;
run;
You really have to seriously expand on exactly what this means:
One usubjid can have two records with different subjid.
As in provide examples, how if there are different values of "subjid" how we can tell from the data that they are supposed to be treated as one. You do not mention any rules for "subjid". How does that variable interact with Actarm???
@abhisas1 wrote:
I have a dataset with multiple subjects . One usubjid can have two records with different subjid. i want to filter the dataset with following rules
- Keep all single records (only one record for that
usubjid
) automatically.- For groups (within
usubjid
) with exactly two records where one record has a non-missingactarm
and the other is missing, keep only the record with the non-missingactarm
.- If both records have non-missing
actarm
, keep both.- If both records have missing
actarm
, keep both.
What if you have more than two records for one usubjectid, what you gotta do ?
data have;
infile datalines truncover;
input usubjid $ actarm $;
datalines;
us222 A
us222
us111 B
us333 C
us333 C
us444
us444
;
run;
proc sql;
create table want as
select *
from have
group by usubjid
having actarm=max(actarm);
quit;
In addition to the questions others have asked it is probably worth considering how you got into this situation to begin with.
Why is it that some of the observations have different values of ACTARM?
Could you go back a step and fix the process that made this dataset so that ACTARM is populated correctly to begin with?
I liked @A_Kh's suggestion to sort by USUBJID descending ACTARM, which is intended to put missing ACTARM obs last within a USUBJID. But without a minor edit, it would not accommodate two distinct valid values for ACTARM (if a USUBJID had a valid "B" and a valid "C", the "B" would be deleted).
Consider this alternative, applied to data HAVE sorted by USUBJID (but not necessarily by ACTARM within USUBJID):
data want;
set have (where=(missing(actarm)=0) in=valid)
have (where=(missing(actarm)) in=miss);
by usubjid;
if lag(valid)=1 and first.usubjid=0 and miss=1 then delete;
run;
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.