Hi Dear SAS Community:
I am trying to write some SAS code that can distinguish between two types of patterns of missing data that can occur in a single record, and treat them differently for the respective record.
The hypothetical dataset is what commonly occurs in the context of exams that have multiple-choice questions that are each scored 1=correct or 0=incorrect. For example, say an exam of just 5 questions taken by 6 people, to make things easy. For several records of the dataset, there are two types of missing data patterns that occur:
Pattern 1 - missing data due to skipping over earlier exam questions with later questions having responses. Skipped over questions should be coded to 0 to be treated as incorrect.
Pattern 2 - missing data due to not reaching questions with the person responding to one or more earlier questions, but all following questions having no responses because the person did not reach them. The not reached questions should be coded to the SAS system missing value.
Below is the hypothetical dataset I have:
10.1 .
11111
11. . .
0. . .1
.1 . .0
. . . . .
Following is the dataset I want after addressing the two patterns of missing data for particular records.
1001.
11111
11 . . .
00001
01000
. . . . .
Can anyone please provide some SAS code that will work to produce the dataset I want from the dataset I have, both shown above?
Thanks in advance!
Aaron
As a set of numeric values, the syntax changes a little. Let's call the variables V1 through V6:
data want;
set have;
array v {6};
do k=6 to 1 by -1;
if v{k} in (0, 1) then reset_flag='Y';
else if v{k} = . and reset_flag='Y' then v{k} = 0;
end;
drop reset_flag;
run;
It looks like you have assembled a single character variable holding all the answers/results. To fix that:
data want;
set have;
do k=length(answers) to 1 by -1;
if substr(answers, k, 1) in ('0', '1') then reset_flag='Y';
else if substr(answers,k,1) = '.' and reset_flag='Y' then substr(answers,k,1) = '0';
end;
drop reset_flag;
run;
Thanks, this is an interesting solution. It does work. But, it deviates from how my data is imported into SAS. The exam questions are imported as numerical variables. Any ideas about what the solution would be for numerical variables?
Kind Regards,
Aaron
As a set of numeric values, the syntax changes a little. Let's call the variables V1 through V6:
data want;
set have;
array v {6};
do k=6 to 1 by -1;
if v{k} in (0, 1) then reset_flag='Y';
else if v{k} = . and reset_flag='Y' then v{k} = 0;
end;
drop reset_flag;
run;
Bravo Astounding!!
Your approach is the correct solution. I did have to clean it up slightly, but well done still and I appreciate the assistance. Please see below your cleaned up solution.
data have;
input item1 1 item2 2 item3 3 item4 4 item5 5;
CARDS;
10.1.
11111
11...
0...1
.1..0
.....
;
run;
data want;
set have;
array v {5} item1-item5;
do k=5 to 1 by -1;
if v{k} in (0, 1) then reset_flag='Y';
else if v{k} = . and reset_flag='Y' then v{k} = 0;
end;
drop reset_flag k;
run;
Depending on what you normally do with the values of the given variables you might instead want to consider using special missing values. The values would then be excluded from calculations such as totals or means or as part of a denominator for calculating percentages but could be printed or examined with a format to indicate such.
data example; input record x y; /* set specific missing values just as an example*/ if record=2 then x= .S; if record=3 then y= .S; if record=4 then do; x=.I; y=.I; end; datalines ; 1 3 18 2 . 2 3 7 . 4 . . ; run; proc means data=example mean sum std; var x y; run; proc freq data=example; tables x y; run; Proc format library=work; value Special .S='Skip Pattern' .I='Incomplete' ; run; proc print data=example; format x y special.; run;
Or possibly just the incomplete records to differentiate from other forms of missing.
Thanks for your BallardW.
Aaron
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.