Solved: Re: How to group and delete observations (ID) with any missing values

lousam · Posted 11-19-2017 05:34 PM

Hello,

I created a data set to help explain my question. In my example, researchers are trying to understand the impact of different types of

Diet plans on change in BMI over 3 study visits.

ID Visit Sex Age Height Weight (lbs) Diet (Type)

1 1 0 23 1.75 200 0

2 1 0 18 2.1 300 1

3 1 1 . 1.5 195 2

1 2 0 24 1.75 180 0

2 2 0 19 . 321 1

3 2 1 29 1.5 . 2

1 3 0 25 1.75 170 0

2 3 0 . 2.1 280 1

3 3 1 30 1.5 135 2

My question is:

- How can I exclude people who weigh below a certain cutoff at visit=1 (for example if I want to exclude all people who weigh <200 lbs at visit 1). The resulting data set would only have ID1 & ID2.

then

- How I can remove ID that have any missing data (missing data could be for any of the variables and for any of the study visits). In this scenario, the resulting data set would only have information for 3 study visits of ID1 (since the other ID have missing values)

My assumption is that I can use proc sql to group and delete ID with missing data but I have no idea how to do it. I appreciate any help you can give me.

PeterClemmensen · Posted 11-20-2017 09:02 AM

Perhaps something like this

proc sql;
   select name into :varlist separated by ',' from dictionary.columns where libname='WORK' and memname='HAVE' ;
   create table want as
   select * from have
   where ID not in 
   (select ID from have where Visit=1 and weight < 200
                           or cmiss(&varlist) > 0);
quit;

The DATA to DATA Step Macro
Blog: SASnrd

View solution in original post

PeterClemmensen · Posted 11-20-2017 02:15 AM

Something like this?

data have;
input ID Visit Sex Age Height Weight Diet;
datalines;
1 1 0 23 1.75 200 0
2 1 0 18 2.1 300 1
3 1 1 .  1.5 195 2
1 2 0 24 1.75 180 0
2 2 0 19 . 321 1
3 2 1 29 1.5 . 2
1 3 0 25 1.75 170 0
2 3 0 . 2.1 280 1
3 3 1 30 1.5 135 2
;

proc sql;
   create table want as
   select * from have
   where ID not in 
   (select ID from have where Visit=1 and weight < 200
                           or cmiss(Visit, Sex, Age, Height, Weight, Diet) > 0);
quit;

The DATA to DATA Step Macro
Blog: SASnrd

lousam · Posted 11-20-2017 08:50 AM

I am trying to apply the concept to a large data set with ~50 variables. Is there a way to do this [for instance using cmiss ()] without listing all of the variables in the code? The variables are all numeric. Thanks for the help

RW9 · Posted 11-20-2017 09:00 AM

Good idea to provide:

Test data in the form of a datastep.

And required output.

If you have lots of variables, you should really follow my post and convert the data to an industry standard, you will find your life much easier. For this problem the simplest way is to normalise the data, i.e. put it going down the page, then apply my logic (and if necessary transpose up again), consider:
USUBJID PARAM LBSTRESN

XYZ WEIGHT 123.4

XYZ HEIGHT 23

ABC WEIGHT ...

...

You can then apply my logic very simply to all the data and just add PARAM to the merge items.

You could do it in a transpose dataset as well, but it becomes far more complicated, you would need to merge on all the records for each of the variables, then do an array comparison to output a new array of results.

Transpose down

Do merge and calcuation

Transpose up

Far simpler.

PeterClemmensen · Posted 11-20-2017 09:02 AM

Perhaps something like this

proc sql;
   select name into :varlist separated by ',' from dictionary.columns where libname='WORK' and memname='HAVE' ;
   create table want as
   select * from have
   where ID not in 
   (select ID from have where Visit=1 and weight < 200
                           or cmiss(&varlist) > 0);
quit;

The DATA to DATA Step Macro
Blog: SASnrd

RW9 · Posted 11-20-2017 05:03 AM

Map the data into industry standard data models = SDTM/ADaM. The reason being if you mapped into ADaM format, one of the sets of variables is baseline and change from baseline. You would follow the logic of:

select baseline record (if this is visit 1 then simple, but you may have repeats, multiples on the same date etc.)

merge this record back to original data based on ID

Then baseline value appears on all records and you can simply filter with where or if.

Not typing in test data - see other posts for how to post test data, so this code is just a guess:

proc sql;
  create table WANT as
  select  A.*,
          B.WEIGHT as BASE
  from    HAVE A
  left join (select * from HAVE where VISIT=1) B
  on      A.ID=B.ID;
quit;

Ready to join fellow brilliant minds for the SAS Hackathon?

Classroom Training Available!