Hello,
I am cleaning our database and trying to re-contact the participants a second time to acquire information in order to reduce missing values. For example, I have 1000 participants and 200 variables. I want to output participants whose data are missing for 100 key variables among the 200 variables. Participant A may only have 2 variables with missing values and participant B may have 30 variables with missing values. I don't know how to output participants with missing values.
The hypothetical dataset as shown below, I want to output participant1 showing var2, var4 with missing values; and participant2 showing var1, var3, var4 and var5 with missing values;
Thanks a lot!
Jing Yuan
no | var1 | var2 | var3 | var4 | var5 | var6 |
---|---|---|---|---|---|---|
1 | 128 | . | 45 | . | 44 | 67 |
2 | . | 34 | . | . | 66 | . |
3 | 235 | . | . | 56 | . | 78 |
4 | 321 | . | 56 | . | . | 66 |
5 | 214 | . | . | . | . | 68 |
What does your input look like?
If you just want anyone with missing values and all values are numeric
if nmiss(of var1-var100)>0 then output;
Thanks. I have both numeric and character variables.
And most are character variables. The code above seems only work for numeric variables. Sorry my hypothetical dataset showing as numberics.
Many thanks.
I don't know of a way to check all at once
You can create two arrays one for character one for numeric, do you have a naming convention for your variables?
No. I don't have naming convention. Would you please tell me the sample for creat array for character variables? I tried but failed.
Thanks.
Thanks, Ballardw. It looks very complicated to me, a newer to SAS. I will try it. Will that also work for character variables?
If you note the variables involved are both character and numeric. The key is providing the list(s) of variables.
The below program will create a list in the results window with the NO and the variables that are missing.
The 6th row of data is to show that it doesn't produce output when all are valid. The data step is to have some
minimal working data.
options missing=' '; /* to make the dummy text variables below actually blank*/
data work.test;
input no var1 var2 var3 var4 var5 var6;
tvar1 = put(var2,best4.);
tvar3 = put(var3,best4.);
datalines;
1 128 . 45 . 44 67
2 . 34 . . 66 .
3 235 . . 56 . 78
4 321 . 56 . . 66
5 214 . . . . 68
6 1 2 3 4 5 6
run;
%let varlist= var1 var2 var3 var4 var5 var6 tvar1 tvar3;/* key, YOU provide your list of variables in the order you want them*/
data _null_;
file print;
set work.test;
if nmiss(of var1-var6,tvar1,tvar3)>0 then do; /* to use VARLIST you would need to have a comma separated version*/
do i = 1 to (countw("&varlist"));
if i=1 then put no= +1 @;
v = scan("&varlist",i);
if missing(vvaluex(scan("&varlist",i))) then put v +5 @;
if i = (countw("&varlist")) then put;
end;
end;
run;
Here is how I would go about solving the problem:
make an input array for your input dataset,
make 2 variables for your output dataset--ID and Missing,
use VTYPE to check the data type of each variable in your input array.
use appropriate number or character functions to do your checking
concatinate the character representation for your index for your input array onto an output string each time you find a missing or null value.
set a flag saying you have missing values
output your string if you have any missing.
keep client number and string showing missing
loop until done.
do what you'll get is:
an ID field and a string filed, the string field will contain all missing indexes.
[1] [2 4]
[2] [1 3 4 6]
[3] [2 3 5]
[4] [2 4 5]
[5] [2 3 4 5]
Let us take a small example based on SASHELP.CLASS.
Create a data set, CLASS from SASHELP.CLASS and at the same time, get the number of numeric and character variables. Also make some the variables to be missing.
data class;
set sashelp.class end = eof;
array n
array c
do _n_ = 1 to dim(n);
if ranuni(123) < 0.7 then n[_n_] = .;
end;
do _n_ = 1 to dim(c);
if ranuni(123) < 0.4 then c[_n_] = ' ';
end;
if eof then do;
call symputx('num', dim(n));
call symputx('chr', dim(c));
end;
run;
%put &num ;
%put &chr= ;
Now the data set, CLASS will resemble what you start with. Now we will check each variable for MISSING, and if so, make it to take a label of 1(one) and if not missing then the label will be 0(zero).
data want;
set class end = eof;
array n
array c
array miss_num
array miss_chr
do i = 1 to dim(n);
miss_num = ifN(missing(n), 1, 0);
end;
do i = 1 to dim(c);
miss_chr = ifN(missing(c), 1, 0);
end;
output;
drop i;
run;
The output data set, WANT, is what you are looking for ( as I understand your question).
I realized that two arrays will do and not 4 arrays in the last program. The revised program follows:
data want;
set class end = eof;
array n
array c
do i = 1 to dim(n);
n = ifN(missing(n), 1, 0);
end;
do i = 1 to dim(c);
c = ifC(missing(c), '1', '0');
end;
output;
drop i;
run;
Message was edited by: MUTHIA KACHIRAYAN
A modified solution based on solution. It assumes that all other variables are not missing, i.e. ID and any other variables in the dataset.
*Create a sample dataset;
data class;
set sashelp.class;
array n
array c
do _n_ = 1 to dim(n);
if ranuni(123) < 0.7 then n[_n_] = .;
end;
do _n_ = 1 to dim(c);
if ranuni(123) < 0.4 then c[_n_] = ' ';
end;
run;
*Create output dataset with only missing variables included;
data want;
set class ;
array n
array c
if nmiss(of n(*))>0 or miss(of c(*))>0 then output;
run;
proc sql;
create table test (myID num,
myOne varchar(5),
myTwo num,
myThree varchar(5),
myFour num,
myFive num,
mySix varchar(5));
insert into test values(1, '', null, '3', 4, 5, '6');
insert into test values(2, '1', null, '3', 4, 5, '');
insert into test values(3, '', 2, '3', null, 5, '');
insert into test values(4, '1', 2, '3', null, 5, '6');
insert into test values(5, '1', 2, '3', 4, 5, '');
quit;
/*
This will give you the user id along with a string containing the name of each colum that
was missing or null. colum list is in a funky order because of using two seperate arrays.
*/
data myOut (replace=yes compress=no);
set test;
array myChar
do _i=1 to dim(myChar);
if(myChar[_i] = '') then do;
myMissingOne = VNAME(myChar[_i]);
myMissing = catx(' ', myMissing, myMissingOne);
myFlag=1;
end;
end;
do _i=1 to dim(myNum);
if(myNum[_i] = .) then do;
myMissingOne = VNAME(myNum[_i]);
myMissing = catx(' ', myMissing, myMissingOne);
myFlag = 1;
end;
end;
if(myFlag = 1) then output;
keep myID myMissing;
run;
/*
This will give you a dataset listing all of the columns that contain either null or missing.
The columns are tied to the client ID through a one to many relationship in this table. One
client ID and many columns. The client ID/Column relation is unique.
*/
data myOutTwo (replace=yes compress=no);
set test;
array myChar
do _i=1 to dim(myChar);
if(myChar[_i] = '') then do;
myMissing = VNAME(myChar[_i]);
output;
end;
end;
do _i=1 to dim(myNum);
if(myNum[_i] = .) then do;
myMissing = VNAME(myNum[_i]);
output;
end;
end;
keep myID myMissing;
run;
Am I missing something here? What's wrong if using CMISS and _ALL_?
data have;
set sashelp.class;
if mod(_n_,2)=1 then
call missing(age, sex);
run;
data want;
set have;
if cmiss(of _all_)>0;
run;
Haikuo
CMISS takes both numeric and character, cause that's what you'd expect :smileygrin:
Nice one!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.