I created a data set to help explain my question. In my example, researchers are trying to understand the impact of different types of Diet plans on change in BMI over 3 study visits.
ID Visit Sex Age Height Weight (lbs) Diet (Type)
1 1 0 23 1.75 200 0
2 1 0 18 2.1 300 1
3 1 1 . 1.5 195 2
1 2 0 24 1.75 180 0
2 2 0 19 . 321 1
3 2 1 29 1.5 . 2
1 3 0 25 1.75 170 0
2 3 0 . 2.1 280 1
3 3 1 30 1.5 135 2
I want to remove all observations for an ID if it has any missing values (I want a new data set that ONLY contains ID with observations for all visits). So the new data set would only include ID1 (since ID2 and ID3 have missing values). My desired output is:
ID Visit Sex Age Height Weight (lbs) Diet (Type)
1 1 0 23 1.75 200 0
1 2 0 24 1.75 180 0
1 3 0 25 1.75 170 0
I am trying to apply this concept to a large data set with ~50 variables. So l would like help to get the desired output without listing the 50 variable names in a procedure. I think I can do this by using proc sql, but I am confused on how to do this.
Run next code and look at result:
select *
from dictionary.columns
where libname='WORK' and memname='HAVE' ;
you will see that name column holds the variable name in the specific dataset.
The code:
proc sql;
select name into :varlist separated by ','
from dictionary.columns
where libname='WORK' and memname='HAVE' ;
creates a macro variable named varlist containing the list of variables in work.have dataset.
In the second part of the code:
select ID from have where cmiss(&varlist) > 0
cmis function - counts missing variables.
The full code removes observations if there are any missing variable (count > 0).
Your query is similar to the next one - see link:
Drop the first part of the WHERE clause in the answer to your almost similar question here, so that the nested select clause looks like
select ID from have where cmiss(&varlist) > 0
Thank you, I apologize for the duplicate question. I accepted the response prior to running the code but I am having some difficulty getting SAS to run the code. I used a portion of the code you provided to restrict the data to individuals whose weight >200 at visit1, but I have not been able to delete ID with missing data.
Code I used:
proc sql;
create table want as
select * from have
where ID not in
(select ID from have where Visit=1 and Weight <200); quit;
Now I am trying to delete all ID with missing data
I am confused about the following parts of your code:
select name into :varlist separated by ',' from dictionary.columns
select ID from have where cmiss(&varlist) > 0
What is the role of "select name" and ":varlist" in the first line? I understand that you can use select to store a list of variables. Does "name" automatically list all the variables found in the libname and data set specified, and then stores it in "varlist"? Is the role of "dictionary.columns" to specify that the variable names are found in the column part of the data set?
What does "&varlist" do in the second line of the code?
I am a new SAS user, and I appreciate all of your help with this
Run next code and look at result:
select *
from dictionary.columns
where libname='WORK' and memname='HAVE' ;
you will see that name column holds the variable name in the specific dataset.
The code:
proc sql;
select name into :varlist separated by ','
from dictionary.columns
where libname='WORK' and memname='HAVE' ;
creates a macro variable named varlist containing the list of variables in work.have dataset.
In the second part of the code:
select ID from have where cmiss(&varlist) > 0
cmis function - counts missing variables.
The full code removes observations if there are any missing variable (count > 0).
So l would like help to get the desired output without listing the 50 variable names in a procedure. I think I can do this by using proc sql, but I am confused on how to do this.
SQL and PROC SQL do not support variable lists, so a SQL approach will not be efficient here, especially in terms of your programming time. You could possibly use a macro, but the solution that seems appropriate here is a data step with the use of an array.
The solution in your previous question seems to apply so I'm confused as to why you've reposted this question today.
How about this one.
data have;
input ID Visit Sex Age Height Weight Diet ;
cards;
1 1 0 23 1.75 200 0
2 1 0 18 2.1 300 1
3 1 1 . 1.5 195 2
1 2 0 24 1.75 180 0
2 2 0 19 . 321 1
3 2 1 29 1.5 . 2
1 3 0 25 1.75 170 0
2 3 0 . 2.1 280 1
3 3 1 30 1.5 135 2
;
run;
data temp;
set have;
if cmiss(of _all_) then flag=1;
run;
proc sql;
create table want(drop=flag) as
select *
from temp
group by id
having sum(flag)=.;
quit;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.