BookmarkSubscribeRSS Feed
deengyn
Obsidian | Level 7

Consider the data set below where the variable 'tot' states the total number of correct answers and where 'mark' states the percentage score of that test. 

data have; 
input id $ tot1 tot2 tot3 tot4 tot5 tot6 mark11 mark12 mark21 mark22 mark31 mark32; 
x 4 2 0 5 6 0 0.4 0.2 0.0 0.5 0.6 0.0
y 7 3 2 2 7 8 0.7 0.3 0.2 0.2 0.7 0.8
x 10 8 4 0 1 3 1.0 0.8 0.4 0.0 0.1 0.3
; run; 

I'm looking to create a new count variable to identify the number of tests where the scores are of passing grade (above 50%) and only consider any column starting with "MARK" and ends with a '2', but having trouble on how to write the conditional code. 

The following table is what I aim to achieve, but I'm quite clueless on how to get there. I only want to consider MARK12, MARK22, and MARK32.

ID

TOT1

TOT2

TOT3

TOT4

TOT5

TOT6

MARK1

MARK12

MARK21

Mark22

MARK31

MARK32

NO_TEST

X

4

2

0

5

6

0

0.4

0.2

0.0

0.5

0.6

0.0

1

Y

7

3

2

2

7

8

0.7

0.3

0.2

0.2

0.7

0.8

1

Z

10

8

4

0

1

3

1.0

0.8

0.4

0.0

0.1

0.3

1

Thank you in advance!

 

2 REPLIES 2
ballardw
Super User

Your example implies that you are looking for "greater than or equal to 50%", not "greater than 50%.

 

One way, note the addition of the DATALINES so the data step executes:

data have; 
input id $ tot1 tot2 tot3 tot4 tot5 tot6 mark11 mark12 mark21 mark22 mark31 mark32; 
datalines;
x 4 2 0 5 6 0 0.4 0.2 0.0 0.5 0.6 0.0
y 7 3 2 2 7 8 0.7 0.3 0.2 0.2 0.7 0.8
x 10 8 4 0 1 3 1.0 0.8 0.4 0.0 0.1 0.3
; 

data want;
   set have;
   array m mark12 mark22 mark32;
   do i=1 to dim(m);
      no_test = sum(no_test,m[i] ge 0.5);
   end;
   drop i;
run; 

If you wanted a generic "any number of mark columns" you did not state that.

The array creates a way to do the same calculation with multiple variables. The Do loop does the same thing for each array element: it adds the current count and a 1 for each value that is ge 0.5. SAS will return a 1 for a true condition and 0 for false and is a very handy tool.

Use the SUM function because that way it handles the cases at the start of the loop where the no_test variable is missing where the + operation would return missing values.

 

Another approach would be

data want;
   set have;
   no_test = sum(mark12 ge 0.5, mark22 ge 0.5, mark32 ge 0.5);

run; 

The first solution would be much preferred if you had lots more "mark" variables as the only change needed would be adding the variables in one place instead of lots of comparisons in the sum function.

Exercise for the interested reader: you could create an array of ALL of the Mark variables (easy with a list)

and in the Do loop test the variable name using the VNAME function to determine if the current mark is to be used.

SASJedi
SAS Super FREQ

I'd do it with an array and a little boolean expression:

data have; 
input id $ tot1 tot2 tot3 tot4 tot5 tot6 mark11 mark12 mark21 mark22 mark31 mark32; 
datalines4;
a 10 10 10 10 10 10 1.0 1.0 1.0 1.0 1.0 1.0
b 10 0 10 0 10 0 1.0 0.0 1.0 0.0 1.0 0.0
c 0 10 0 10 0 10 0.0 1.0 0.0 1.0 0.0 1.0
x 4 2 0 5 6 0 0.4 0.2 0.0 0.5 0.6 0.0
y 7 3 2 2 7 8 0.7 0.3 0.2 0.2 0.7 0.8
z 10 8 4 0 1 3 1.0 0.8 0.4 0.0 0.1 0.3
;;;;

data want;
   set have;
/* Just look at the variables where the name starts with 'mark'*/ array m[*] mark:; /* zero out the count variable */ no_test=0; do _i=1 to dim(m); /* if the variable name ends in '2' and the value is >.5, this expression is true, otherwise false (1 otherwise 0 in SAS) */
/* Add the result to the count variable */ no_test+(char(reverse(vname(m[_i]) ),1)='2' and m[_i]>.5); end; drop _:; run;

This produces this result:

Obs id tot1 tot2 tot3 tot4 tot5 tot6 mark11 mark12 mark21 mark22 mark31 mark32 no_test
1 a 10 10 10 10 10 10 1.0 1.0 1.0 1.0 1.0 1.0 3
2 b 10 0 10 0 10 0 1.0 0.0 1.0 0.0 1.0 0.0 0
3 c 0 10 0 10 0 10 0.0 1.0 0.0 1.0 0.0 1.0 3
4 x 4 2 0 5 6 0 0.4 0.2 0.0 0.5 0.6 0.0 0
5 y 7 3 2 2 7 8 0.7 0.3 0.2 0.2 0.7 0.8 1
6 z 10 8 4 0 1 3 1.0 0.8 0.4 0.0 0.1 0.3 1

May the SAS be with you!

Mark

Check out my Jedi SAS Tricks for SAS Users

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 784 views
  • 0 likes
  • 3 in conversation