BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ADouglas
Obsidian | Level 7

Hi Dear SAS Community:

 

I am trying to write some SAS code that can distinguish between two types of patterns of missing data that can occur in a single record, and treat them differently for the respective record. 

 

The hypothetical dataset is what commonly occurs in the context of exams that have multiple-choice questions that are each scored 1=correct or 0=incorrect. For example, say an exam of just 5 questions taken by 6 people, to make things easy. For several records of the dataset, there are two types of missing data patterns that occur:

 

Pattern 1 - missing data due to skipping over earlier exam questions with later questions having responses. Skipped over questions should be coded to 0 to be treated as incorrect. 

 

Pattern 2 - missing data due to not reaching questions with the person responding to one or more earlier questions, but all following questions having no responses because the person did not reach them. The not reached questions should be coded to the SAS system missing value.

 

Below is the hypothetical dataset I have:

 

10.1 .

11111

11. . .

0. . .1

.1 . .0

. . . . .

 

Following is the dataset I want after addressing the two patterns of missing data for particular records.

 

1001.

11111

11 . . .

00001

01000

. . . . .

 

Can anyone please provide some SAS code that will work to produce the dataset I want from the dataset I have, both shown above?

 

Thanks in advance!

 

Aaron

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

As a set of numeric values, the syntax changes a little.  Let's call the variables V1 through V6:

 

data want;

set have;

array v {6};

do k=6 to 1 by -1;

   if v{k} in (0, 1) then reset_flag='Y';

   else if v{k} = . and reset_flag='Y' then v{k} = 0;

end;

drop reset_flag;

run;

 

 

View solution in original post

6 REPLIES 6
Astounding
PROC Star

It looks like you have assembled a single character variable holding all the answers/results.  To fix that:

 

data want;

set have;

do k=length(answers) to 1 by -1;

   if substr(answers, k, 1)  in ('0', '1') then reset_flag='Y';

   else if substr(answers,k,1) = '.' and reset_flag='Y' then substr(answers,k,1) = '0';

end;

drop reset_flag;

run;

ADouglas
Obsidian | Level 7

Thanks, this is an interesting solution. It does work. But, it deviates from how my data is imported into SAS. The exam questions are imported as numerical variables. Any ideas about what the solution would be for numerical variables?

 

Kind Regards,

Aaron

Astounding
PROC Star

As a set of numeric values, the syntax changes a little.  Let's call the variables V1 through V6:

 

data want;

set have;

array v {6};

do k=6 to 1 by -1;

   if v{k} in (0, 1) then reset_flag='Y';

   else if v{k} = . and reset_flag='Y' then v{k} = 0;

end;

drop reset_flag;

run;

 

 

ADouglas
Obsidian | Level 7

Bravo Astounding!!

 

Your approach is the correct solution. I did have to clean it up slightly, but well done still and I appreciate the assistance. Please see below your cleaned up solution. 

 

data have;
input item1 1 item2 2 item3 3 item4 4 item5 5;
CARDS;
10.1.
11111
11...
0...1
.1..0
.....
;
run;

 

data want;
set have;
array v {5} item1-item5;
do k=5 to 1 by -1;
if v{k} in (0, 1) then reset_flag='Y';
else if v{k} = . and reset_flag='Y' then v{k} = 0;
end;
drop reset_flag k;
run;

ballardw
Super User

Depending on what you normally do with the values of the given variables you might instead want to consider using special missing values. The values would then be excluded from calculations such as totals or means or as part of a denominator for calculating percentages but could be printed or examined with a format to indicate such.

 

data example;
    input record x y;
    /* set specific missing values just as an example*/
    if record=2 then x= .S;
    if record=3 then y= .S;
    if record=4 then do;
       x=.I;
       y=.I;
   end;
datalines ;
1 3 18
2 .  2
3 7  .
4 .  .
;
run;

proc means data=example mean sum std;
   var x y;
run;

proc freq data=example;
   tables x y;
run;

Proc format library=work;
   value Special
   .S='Skip Pattern'
   .I='Incomplete'
   ;
run;

proc print data=example;
   format x y special.;
run;

Or possibly just the incomplete records to differentiate from other forms of missing.

 

ADouglas
Obsidian | Level 7

Thanks for your BallardW.

 

Aaron

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 889 views
  • 2 likes
  • 3 in conversation