Help using Base SAS procedures

Deleting a group of observations with missing values

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 10
Accepted Solution

Deleting a group of observations with missing values

 

I created a data set to help explain my question. In my example, researchers are trying to understand the impact of different types of  Diet plans on change in BMI over 3 study visits.  

 

ID Visit  Sex  Age   Height   Weight (lbs)     Diet (Type)

 

1    1     0      23       1.75       200                  0

2    1     0      18        2.1        300                  1

3    1     1      .           1.5        195                  2

 

1    2     0      24        1.75      180                  0

2    2     0      19          .           321                 1

3    2     1      29        1.5           .                    2

 

1    3     0      25        1.75     170                   0

2    3     0       .          2.1        280                  1

3    3     1       30        1.5      135                   2

 

I want to remove all observations for an ID if it has any missing values (I want a new data set that ONLY contains ID with observations for all visits). So the new data set would only include ID1 (since ID2 and ID3 have missing values). My desired output is: 

 

ID Visit  Sex  Age   Height   Weight (lbs)     Diet (Type)

 

1    1     0      23       1.75       200                  0

 

1    2     0      24        1.75      180                  0

 

1    3     0      25        1.75     170                   0

 

I am trying to apply this concept to a large data set with ~50 variables. So l would like help to get the desired output without listing the 50 variable names in a procedure. I think I can do this by using proc sql, but I am confused on how to do this. 

 


Accepted Solutions
Solution
‎11-22-2017 06:39 PM
Trusted Advisor
Posts: 1,837

Re: Deleting a group of observations with missing values

[ Edited ]

Run next code and look at result:

select * 
      from dictionary.columns 
      where libname='WORK' and memname='HAVE' ;

you will see that name column holds the variable name in the specific dataset.

 

The code:

 

proc sql;
   select name into :varlist separated by ',' 
      from dictionary.columns 
      where libname='WORK' and memname='HAVE' ;

creates a macro variable named varlist containing the list of variables in work.have dataset.

 

 

In the second part of the code:

select ID from have where cmiss(&varlist) > 0

cmis function - counts missing variables. 

 

The full code removes observations if there are any missing variable (count > 0).

 

View solution in original post


All Replies
Trusted Advisor
Posts: 1,837

Re: Deleting a group of observations with missing values

PROC Star
Posts: 1,283

Re: Deleting a group of observations with missing values

Drop the first part of the WHERE clause in the answer to your almost similar question here, so that the nested select clause looks like

 

select ID from have where cmiss(&varlist) > 0

 

 

Occasional Contributor
Posts: 10

Re: Deleting a group of observations with missing values

[ Edited ]

Thank you, I apologize for the duplicate question. I accepted the response prior to running the code but I am having some difficulty getting SAS to run the code. I used a portion of the code you provided to restrict the data to individuals whose weight >200 at visit1, but I have not been able to delete ID with missing data. 

 

Code I used:

proc sql;
create table want as
select * from have
where ID not in
(select ID from have where Visit=1 and  Weight <200); quit;

 

Now I am trying to delete all ID with missing data

 

I am confused about the following parts of your code:

select name into :varlist separated by ',' from dictionary.columns 
select ID from have where cmiss(&varlist) > 0

 

What is the role of "select name" and ":varlist" in the first line? I understand that you can use select to store a list of variables. Does "name" automatically list all the variables found in the libname and data set specified, and then stores it in "varlist"?  Is the role of "dictionary.columns" to specify that the variable names are found in the column part of the data set?

 

What does "&varlist" do in the second line of the code?

 

I am a new SAS user, and I appreciate all of your help with this  

Solution
‎11-22-2017 06:39 PM
Trusted Advisor
Posts: 1,837

Re: Deleting a group of observations with missing values

[ Edited ]

Run next code and look at result:

select * 
      from dictionary.columns 
      where libname='WORK' and memname='HAVE' ;

you will see that name column holds the variable name in the specific dataset.

 

The code:

 

proc sql;
   select name into :varlist separated by ',' 
      from dictionary.columns 
      where libname='WORK' and memname='HAVE' ;

creates a macro variable named varlist containing the list of variables in work.have dataset.

 

 

In the second part of the code:

select ID from have where cmiss(&varlist) > 0

cmis function - counts missing variables. 

 

The full code removes observations if there are any missing variable (count > 0).

 

Super User
Posts: 23,724

Re: Deleting a group of observations with missing values

So l would like help to get the desired output without listing the 50 variable names in a procedure. I think I can do this by using proc sql, but I am confused on how to do this. 

 

 

SQL and PROC SQL do not support variable lists, so a SQL approach will not be efficient here, especially in terms of your programming time. You could possibly use a macro, but the solution that seems appropriate here is a data step with the use of an array.

 

The solution in your previous question seems to apply so I'm confused as to why you've reposted this question today.

Super User
Posts: 10,778

Re: Deleting a group of observations with missing values

How about this one.

 

data have;
input ID Visit  Sex  Age   Height   Weight     Diet ;
cards;
1    1     0      23       1.75       200                  0
2    1     0      18        2.1        300                  1
3    1     1      .           1.5        195                  2
1    2     0      24        1.75      180                  0
2    2     0      19          .           321                 1
3    2     1      29        1.5           .                    2
1    3     0      25        1.75     170                   0
2    3     0       .          2.1        280                  1
3    3     1       30        1.5      135                   2
;
run;
data temp;
 set have;
 if cmiss(of _all_) then flag=1;
run;

proc sql;
create table want(drop=flag) as
select *
 from temp
  group by id
   having sum(flag)=.;
quit;
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 385 views
  • 3 likes
  • 5 in conversation