BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
lousam
Obsidian | Level 7
 

I created a data set to help explain my question. In my example, researchers are trying to understand the impact of different types of  Diet plans on change in BMI over 3 study visits.  

 

ID Visit  Sex  Age   Height   Weight (lbs)     Diet (Type)

 

1    1     0      23       1.75       200                  0

2    1     0      18        2.1        300                  1

3    1     1      .           1.5        195                  2

 

1    2     0      24        1.75      180                  0

2    2     0      19          .           321                 1

3    2     1      29        1.5           .                    2

 

1    3     0      25        1.75     170                   0

2    3     0       .          2.1        280                  1

3    3     1       30        1.5      135                   2

 

I want to remove all observations for an ID if it has any missing values (I want a new data set that ONLY contains ID with observations for all visits). So the new data set would only include ID1 (since ID2 and ID3 have missing values). My desired output is: 

 

ID Visit  Sex  Age   Height   Weight (lbs)     Diet (Type)

 

1    1     0      23       1.75       200                  0

 

1    2     0      24        1.75      180                  0

 

1    3     0      25        1.75     170                   0

 

I am trying to apply this concept to a large data set with ~50 variables. So l would like help to get the desired output without listing the 50 variable names in a procedure. I think I can do this by using proc sql, but I am confused on how to do this. 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Shmuel
Garnet | Level 18

Run next code and look at result:

select * 
      from dictionary.columns 
      where libname='WORK' and memname='HAVE' ;

you will see that name column holds the variable name in the specific dataset.

 

The code:

 

proc sql;
   select name into :varlist separated by ',' 
      from dictionary.columns 
      where libname='WORK' and memname='HAVE' ;

creates a macro variable named varlist containing the list of variables in work.have dataset.

 

 

In the second part of the code:

select ID from have where cmiss(&varlist) > 0

cmis function - counts missing variables. 

 

The full code removes observations if there are any missing variable (count > 0).

 

View solution in original post

6 REPLIES 6
PeterClemmensen
Tourmaline | Level 20

Drop the first part of the WHERE clause in the answer to your almost similar question here, so that the nested select clause looks like

 

select ID from have where cmiss(&varlist) > 0

 

 

lousam
Obsidian | Level 7

Thank you, I apologize for the duplicate question. I accepted the response prior to running the code but I am having some difficulty getting SAS to run the code. I used a portion of the code you provided to restrict the data to individuals whose weight >200 at visit1, but I have not been able to delete ID with missing data. 

 

Code I used:

proc sql;
create table want as
select * from have
where ID not in
(select ID from have where Visit=1 and  Weight <200); quit;

 

Now I am trying to delete all ID with missing data

 

I am confused about the following parts of your code:

select name into :varlist separated by ',' from dictionary.columns 
select ID from have where cmiss(&varlist) > 0

 

What is the role of "select name" and ":varlist" in the first line? I understand that you can use select to store a list of variables. Does "name" automatically list all the variables found in the libname and data set specified, and then stores it in "varlist"?  Is the role of "dictionary.columns" to specify that the variable names are found in the column part of the data set?

 

What does "&varlist" do in the second line of the code?

 

I am a new SAS user, and I appreciate all of your help with this  

Shmuel
Garnet | Level 18

Run next code and look at result:

select * 
      from dictionary.columns 
      where libname='WORK' and memname='HAVE' ;

you will see that name column holds the variable name in the specific dataset.

 

The code:

 

proc sql;
   select name into :varlist separated by ',' 
      from dictionary.columns 
      where libname='WORK' and memname='HAVE' ;

creates a macro variable named varlist containing the list of variables in work.have dataset.

 

 

In the second part of the code:

select ID from have where cmiss(&varlist) > 0

cmis function - counts missing variables. 

 

The full code removes observations if there are any missing variable (count > 0).

 

Reeza
Super User

So l would like help to get the desired output without listing the 50 variable names in a procedure. I think I can do this by using proc sql, but I am confused on how to do this. 

 

 

SQL and PROC SQL do not support variable lists, so a SQL approach will not be efficient here, especially in terms of your programming time. You could possibly use a macro, but the solution that seems appropriate here is a data step with the use of an array.

 

The solution in your previous question seems to apply so I'm confused as to why you've reposted this question today.

Ksharp
Super User

How about this one.

 

data have;
input ID Visit  Sex  Age   Height   Weight     Diet ;
cards;
1    1     0      23       1.75       200                  0
2    1     0      18        2.1        300                  1
3    1     1      .           1.5        195                  2
1    2     0      24        1.75      180                  0
2    2     0      19          .           321                 1
3    2     1      29        1.5           .                    2
1    3     0      25        1.75     170                   0
2    3     0       .          2.1        280                  1
3    3     1       30        1.5      135                   2
;
run;
data temp;
 set have;
 if cmiss(of _all_) then flag=1;
run;

proc sql;
create table want(drop=flag) as
select *
 from temp
  group by id
   having sum(flag)=.;
quit;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 3136 views
  • 3 likes
  • 5 in conversation