SAS Programming

abhisas1 · Posted 10-30-2024 05:33 PM

I have a dataset with multiple subjects . One usubjid can have two records with different subjid. i want to filter the dataset with following rules

Keep all single records (only one record for that usubjid) automatically.
For groups (within usubjid) with exactly two records where one record has a non-missing actarm and the other is missing, keep only the record with the non-missing actarm.
If both records have non-missing actarm, keep both.
If both records have missing actarm, keep both.

mkeintz · Posted 10-30-2024 05:39 PM

You want a code suggestion. So help us help you. Please provide sample data in the form of a DATA step, and show the corresponding desired results.

BTW, what have you tried so far?

Oh yes, and can your dataset have a USUBJID with MORE THAN 2 observations?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

A_Kh · Posted 10-30-2024 05:56 PM

One way is processing data in BY/GROUP and RETAIN statement where first.variable reads the first record in the group (USUBJID here) and the last.variable reads the second/last record, retain statement carries over the first record to the next record to enable comparison.

data have; 
	infile datalines truncover; 
	input usubjid $ actarm $;
	datalines;
us222 A
us222 
us111 B
us333 C
us333 C
us444 
us444 
;
run; 


proc sort data=have; 
	by usubjid descending actarm;
run; 

data want;
	set have;
	by usubjid descending actarm;
	retain _actarm;
	if first.usubjid then _actarm=actarm;
	if last.usubjid and _actarm^=actarm then delete; 
	drop _:; 
run;

ballardw · Posted 10-30-2024 06:14 PM

You really have to seriously expand on exactly what this means:

 One usubjid can have two records with different subjid.

As in provide examples, how if there are different values of "subjid" how we can tell from the data that they are supposed to be treated as one. You do not mention any rules for "subjid". How does that variable interact with Actarm???

@abhisas1 wrote:

I have a dataset with multiple subjects . One usubjid can have two records with different subjid. i want to filter the dataset with following rules

Keep all single records (only one record for that usubjid) automatically.

For groups (within usubjid) with exactly two records where one record has a non-missing actarm and the other is missing, keep only the record with the non-missing actarm.

If both records have non-missing actarm, keep both.

If both records have missing actarm, keep both.

Ksharp · Posted 10-30-2024 09:35 PM

What if you have more than two records for one usubjectid, what you gotta do ?

data have; 
	infile datalines truncover; 
	input usubjid $ actarm $;
	datalines;
us222 A
us222 
us111 B
us333 C
us333 C
us444 
us444 
;
run; 
proc sql;
create table want as
select *
 from have 
  group by usubjid
   having actarm=max(actarm);
quit;

Tom · Posted 10-30-2024 10:38 PM

In addition to the questions others have asked it is probably worth considering how you got into this situation to begin with.

Why is it that some of the observations have different values of ACTARM?

Could you go back a step and fix the process that made this dataset so that ACTARM is populated correctly to begin with?

mkeintz · Posted 10-31-2024 12:29 PM

I liked @A_Kh's suggestion to sort by USUBJID descending ACTARM, which is intended to put missing ACTARM obs last within a USUBJID. But without a minor edit, it would not accommodate two distinct valid values for ACTARM (if a USUBJID had a valid "B" and a valid "C", the "B" would be deleted).

Consider this alternative, applied to data HAVE sorted by USUBJID (but not necessarily by ACTARM within USUBJID):

data want;
  set have (where=(missing(actarm)=0)  in=valid)
      have (where=(missing(actarm))   in=miss);
  by usubjid;

  if lag(valid)=1 and first.usubjid=0 and miss=1 then delete;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SAS Programming

need help with a dataset filtering of records

Re: need help with a dataset filtering of records

Re: need help with a dataset filtering of records

Re: need help with a dataset filtering of records

Re: need help with a dataset filtering of records

Re: need help with a dataset filtering of records

Re: need help with a dataset filtering of records

Follow Us

What is...

SAS Programming

need help with a dataset filtering of records

Re: need help with a dataset filtering of records

Re: need help with a dataset filtering of records

Re: need help with a dataset filtering of records

Re: need help with a dataset filtering of records

Re: need help with a dataset filtering of records

Re: need help with a dataset filtering of records

Our biggest data and AI event of the year.

SAS Training: Just a Click Away

Follow Us

What is...