SAS Programming

DATA Step, Macro, Functions and more
BookmarkSubscribeRSS Feed
abhisas1
Fluorite | Level 6

I have  a dataset with multiple subjects . One usubjid can have two records with different subjid. i want to filter the dataset with following rules 

  • Keep all single records (only one record for that usubjid) automatically.
  • For groups (within usubjid) with exactly two records where one record has a non-missing actarm and the other is missing, keep only the record with the non-missing actarm.
  • If both records have non-missing actarm, keep both.
  • If both records have missing actarm, keep both.

 

 

6 REPLIES 6
mkeintz
PROC Star

You want a code suggestion.  So help us help you.  Please provide sample data in the form of a DATA step, and show the corresponding desired results.

 

BTW, what have you tried so far?

 

Oh yes, and can your dataset have a USUBJID with MORE THAN 2 observations?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
A_Kh
Barite | Level 11

One way is processing data in BY/GROUP  and RETAIN statement where first.variable reads the first record in the group (USUBJID here) and the last.variable reads the second/last record, retain statement carries over the first record to the next record to enable comparison. 

data have; 
	infile datalines truncover; 
	input usubjid $ actarm $;
	datalines;
us222 A
us222 
us111 B
us333 C
us333 C
us444 
us444 
;
run; 


proc sort data=have; 
	by usubjid descending actarm;
run; 

data want;
	set have;
	by usubjid descending actarm;
	retain _actarm;
	if first.usubjid then _actarm=actarm;
	if last.usubjid and _actarm^=actarm then delete; 
	drop _:; 
run; 



ballardw
Super User

You really have to seriously expand on exactly what this means:

 One usubjid can have two records with different subjid. 

As in provide examples, how if there are different values of "subjid" how we can tell from the data that they are supposed to be treated as one. You do not mention any rules for "subjid". How does that variable interact with Actarm???

 


@abhisas1 wrote:

I have  a dataset with multiple subjects . One usubjid can have two records with different subjid. i want to filter the dataset with following rules 

  • Keep all single records (only one record for that usubjid) automatically.
  • For groups (within usubjid) with exactly two records where one record has a non-missing actarm and the other is missing, keep only the record with the non-missing actarm.
  • If both records have non-missing actarm, keep both.
  • If both records have missing actarm, keep both.

 

 


 

Ksharp
Super User

What if you have more than two records for one usubjectid, what you gotta do ?

data have; 
	infile datalines truncover; 
	input usubjid $ actarm $;
	datalines;
us222 A
us222 
us111 B
us333 C
us333 C
us444 
us444 
;
run; 
proc sql;
create table want as
select *
 from have 
  group by usubjid
   having actarm=max(actarm);
quit;
Tom
Super User Tom
Super User

In addition to the questions others have asked it is probably worth considering how you got into this situation to begin with.

Why is it that some of the observations have different values of ACTARM?

Could you go back a step and fix the process that made this dataset so that ACTARM is populated correctly to begin with?

mkeintz
PROC Star

I liked @A_Kh's suggestion to sort by USUBJID descending ACTARM, which is intended to put missing ACTARM obs last within a USUBJID.  But without a minor edit, it would not accommodate two distinct valid values for ACTARM (if a USUBJID had a valid "B" and a valid "C", the "B" would be deleted).

 

Consider this alternative, applied to data HAVE sorted by USUBJID (but not necessarily by ACTARM within USUBJID):

 

data want;
  set have (where=(missing(actarm)=0)  in=valid)
      have (where=(missing(actarm))   in=miss);
  by usubjid;

  if lag(valid)=1 and first.usubjid=0 and miss=1 then delete;
run;

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sas-innovate-white.png

Our biggest data and AI event of the year.

Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.

Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 2481 views
  • 0 likes
  • 6 in conversation