Solved: Re: Deleting a group of observations with missing values

lousam · Posted 11-20-2017 03:21 PM

I created a data set to help explain my question. In my example, researchers are trying to understand the impact of different types of Diet plans on change in BMI over 3 study visits.

ID Visit Sex Age Height Weight (lbs) Diet (Type)

1 1 0 23 1.75 200 0

2 1 0 18 2.1 300 1

3 1 1 . 1.5 195 2

1 2 0 24 1.75 180 0

2 2 0 19 . 321 1

3 2 1 29 1.5 . 2

1 3 0 25 1.75 170 0

2 3 0 . 2.1 280 1

3 3 1 30 1.5 135 2

I want to remove all observations for an ID if it has any missing values (I want a new data set that ONLY contains ID with observations for all visits). So the new data set would only include ID1 (since ID2 and ID3 have missing values). My desired output is:

ID Visit Sex Age Height Weight (lbs) Diet (Type)

1 1 0 23 1.75 200 0

1 2 0 24 1.75 180 0

1 3 0 25 1.75 170 0

I am trying to apply this concept to a large data set with ~50 variables. So l would like help to get the desired output without listing the 50 variable names in a procedure. I think I can do this by using proc sql, but I am confused on how to do this.

Shmuel · Posted 11-21-2017 08:07 PM

Run next code and look at result:

select * 
      from dictionary.columns 
      where libname='WORK' and memname='HAVE' ;

you will see that name column holds the variable name in the specific dataset.

The code:

proc sql;
   select name into :varlist separated by ',' 
      from dictionary.columns 
      where libname='WORK' and memname='HAVE' ;

creates a macro variable named varlist containing the list of variables in work.have dataset.

In the second part of the code:

select ID from have where cmiss(&varlist) > 0

cmis function - counts missing variables.

The full code removes observations if there are any missing variable (count > 0).

View solution in original post

Shmuel · Posted 11-20-2017 03:42 PM

Your query is similar to the next one - see link:

https://communities.sas.com/t5/SAS-Procedures/How-to-group-and-delete-observations-ID-with-any-missi...

PeterClemmensen · Posted 11-20-2017 03:49 PM

Drop the first part of the WHERE clause in the answer to your almost similar question here, so that the nested select clause looks like

select ID from have where cmiss(&varlist) > 0

The DATA to DATA Step Macro
Blog: SASnrd

lousam · Posted 11-20-2017 04:40 PM

Thank you, I apologize for the duplicate question. I accepted the response prior to running the code but I am having some difficulty getting SAS to run the code. I used a portion of the code you provided to restrict the data to individuals whose weight >200 at visit1, but I have not been able to delete ID with missing data.

Code I used:

proc sql;
create table want as
select * from have
where ID not in
(select ID from have where Visit=1 and Weight <200); quit;

Now I am trying to delete all ID with missing data

I am confused about the following parts of your code:

select name into :varlist separated by ',' from dictionary.columns

select ID from have where cmiss(&varlist) > 0

What is the role of "select name" and ":varlist" in the first line? I understand that you can use select to store a list of variables. Does "name" automatically list all the variables found in the libname and data set specified, and then stores it in "varlist"? Is the role of "dictionary.columns" to specify that the variable names are found in the column part of the data set?

What does "&varlist" do in the second line of the code?

I am a new SAS user, and I appreciate all of your help with this

Shmuel · Posted 11-21-2017 08:07 PM

Run next code and look at result:

select * 
      from dictionary.columns 
      where libname='WORK' and memname='HAVE' ;

you will see that name column holds the variable name in the specific dataset.

The code:

proc sql;
   select name into :varlist separated by ',' 
      from dictionary.columns 
      where libname='WORK' and memname='HAVE' ;

creates a macro variable named varlist containing the list of variables in work.have dataset.

In the second part of the code:

select ID from have where cmiss(&varlist) > 0

cmis function - counts missing variables.

The full code removes observations if there are any missing variable (count > 0).

Reeza · Posted 11-20-2017 03:51 PM

So l would like help to get the desired output without listing the 50 variable names in a procedure. I think I can do this by using proc sql, but I am confused on how to do this.

SQL and PROC SQL do not support variable lists, so a SQL approach will not be efficient here, especially in terms of your programming time. You could possibly use a macro, but the solution that seems appropriate here is a data step with the use of an array.

The solution in your previous question seems to apply so I'm confused as to why you've reposted this question today.

Ksharp · Posted 11-21-2017 08:21 AM

How about this one.

data have;
input ID Visit  Sex  Age   Height   Weight     Diet ;
cards;
1    1     0      23       1.75       200                  0
2    1     0      18        2.1        300                  1
3    1     1      .           1.5        195                  2
1    2     0      24        1.75      180                  0
2    2     0      19          .           321                 1
3    2     1      29        1.5           .                    2
1    3     0      25        1.75     170                   0
2    3     0       .          2.1        280                  1
3    3     1       30        1.5      135                   2
;
run;
data temp;
 set have;
 if cmiss(of _all_) then flag=1;
run;

proc sql;
create table want(drop=flag) as
select *
 from temp
  group by id
   having sum(flag)=.;
quit;

Ready to join fellow brilliant minds for the SAS Hackathon?

Classroom Training Available!