BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ab81
Calcite | Level 5

Hello! I am looking for help creating a 0/1 categorical variable, where 0 = negative for the specified condition and 1 = positive for the specified condition. The statement I am using (pasted below) pulls in a bunch of different variables that must meet certain criteria to be slated into the "1" category. I want to verify that 1) what I am doing is correct, and my statements aren't overwriting any previous statements, and 2) see if there is a better way to do this, perhaps using an array of some kind (?) Any advice is welcome, and I would be happy to provide further explanations if necessary. Thank you!

 

*/ Number of persons identified as having experienced FM3strong = 1 plus VALUE = 1 or LABOR = 1;

OR  number of persons answering yes to 2 or more of the following: R1strong, EP1strong, PL1strong, DC3strong, FM1strong or FM2strong, V3strong or V4strong, plus VALUE = 1 or LABOR = 1;

OR number of persons answering yes to at least 1 of the following: R1strong, EP1strong, PL1strong, DC3strong, FM1strong, FM2strong, V3strong, V4strong PLUS answering yes to 3 or more of the following: EP5med, EP6med, EP7med, FM4med, V8med, plus VALUE = 1 or LABOR = 1. 
*/

 

TIPlifenum =.;
if FM3strong = 1 and VALUE = 1 then TIPlifenum = 1;
else if FM3strong = 1 and LABOR = 1 then TIPlifenum = 1;
else if sum(R1strong, EP1strong, PL1strong, DC3strong, FM1strong, V3strong) ge 2 or
sum (R1strong, EP1strong, PL1strong, DC3strong, FM1strong, V4strong) ge 2 or
sum(R1strong, EP1strong, PL1strong, DC3strong, FM2strong, V3strong) ge 2 or
sum(R1strong, EP1strong, PL1strong, DC3strong, FM2strong, V4strong) ge 2
and VALUE = 1 then TIPlifenum = 1;
else if sum(R1strong, EP1strong, PL1strong, DC3strong, FM1strong, V3strong) ge 2 or
sum (R1strong, EP1strong, PL1strong, DC3strong, FM1strong, V4strong) ge 2 or
sum(R1strong, EP1strong, PL1strong, DC3strong, FM2strong, V3strong) ge 2 or
sum(R1strong, EP1strong, PL1strong, DC3strong, FM2strong, V4strong) ge 2
and LABOR = 1 then TIPlifenum = 1;
else if sum (R1strong, EP1strong, PL1strong, FM1strong, FM2strong, V3strong, V4strong) ge 1 and sum(EP5med, EP6med, EP7med, FM4med, V8med) ge 3
and VALUE = 1 then TIPlifenum = 1;
else if sum (R1strong, EP1strong, PL1strong, FM1strong, FM2strong, V3strong, V4strong) ge 1 and sum(EP5med, EP6med, EP7med, FM4med, V8med) ge 3
and LABOR = 1 then TIPlifenum = 1;
else TIPlifenum = 0;

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hello @ab81,

 

In addition to the missing parentheses around the Boolean expressions combined with OR (as pointed out by @PhilC) you missed variable DC3strong in two of the last four sums. It seems that @PhilC has copied this mistake into his code.

 


@ab81 wrote:

(...) see if there is a better way to do this, perhaps using an array of some kind (?)


Yes, you can use arrays to abbreviate repetitive lists of variables:

data want;
set have;
array _a[*] R1strong EP1strong PL1strong DC3strong;
array _b[*] FM1strong FM2strong;
array _c[*] V3strong V4strong;
TIPlifenum=
  (FM3strong = 1 
     | sum(0, of _a[*], max(0, of _b[*]), max(0, of _c[*]))>=2
     | sum(0, of _a[*], of _b[*], of _c[*])>=1 & sum(0, EP5med, EP6med, EP7med, FM4med, V8med)>=3)
  & (VALUE = 1 | LABOR = 1);
run;

This assumes 0-1 coded variables, but also allows for missing values. If you are sure that no missing values occur, you can delete the zeros in the SUM and MAX function arguments. Their only purpose is to avoid notes "Missing value were generated ..." in the log. They do not change the results.

 

In general, missing values can be harmful. For example, the expression

 

sum_strong0 + DC3strong

 

(contained in @PhilC's code) would be missing if sum_strong0 was missing and thus lead to ignoring a valid value DC3strong=1.

 

I used this input dataset HAVE and then PROC COMPARE on the "WANT" datasets to compare the suggested solutions:

data have;
do FM3strong = .,0,1;
do VALUE     = .,0,1;
do LABOR     = .,0,1;
do R1strong  = .,0,1;
do EP1strong = .,0,1;
do PL1strong = .,0,1;
do DC3strong = .,0,1;
do FM1strong = .,0,1;
do FM2strong = .,0,1;
do V3strong  = .,0,1;
do V4strong  = .,0,1;
do EP5med    = .,0,1;
do EP6med    = .,0,1;
do EP7med    = .,0,1;
do FM4med    = .,0,1;
do V8med     = .,0,1;
  output;
end; end; end; end; end; end; end; end; end; end; end; end; end; end; end; end;
run; /* 3**16=43046721 obs. */

 

View solution in original post

8 REPLIES 8
PhilC
Rhodochrosite | Level 12

I see a way to simplify this. (edit: I'm assuming that the "or'd sums" have parentheses.)  This reminds me of my college Discrete Mathematics class, my Algebra classes, and both Boolean algebra and the Distributive property.  I'll leave it to you or someone else to prove:

TIPlifenum = (FM3strong OR sum_strong  OR sum_strong2 ) and (VALUE OR LABOR ) ;

where: 

sum_strong0= sum(R1strong, EP1strong, PL1strong);
sum_strong=
   sum (
     sum_strong0 + DC3strong,
     max ( sum(FM1strong, V3strong), 
           sum(FM1strong, V4strong),
           sum(FM2strong, V3strong),
           sum(FM2strong, V4strong)  )) ge 2;
sum_strong2=
   sum(sum_strong0, FM1strong, FM2strong, V3strong, V4strong) ge 1 and
   sum(EP5med, EP6med, EP7med, FM4med, V8med) ge 3;

 

PhilC
Rhodochrosite | Level 12

Question: you are not running this algorithm on super large datasets are you?  Or IOW, you are not optimizing this code for purposes of increased speed?

PhilC
Rhodochrosite | Level 12

One more thing, Order of Operations, you'll have to take care are using these correctly.  AND is higher on the order of operations so the AND here happens first

 

else if sum(R1strong, EP1strong, PL1strong, DC3strong, FM1strong, V3strong) ge 2 or
sum (R1strong, EP1strong, PL1strong, DC3strong, FM1strong, V4strong) ge 2 or
sum(R1strong, EP1strong, PL1strong, DC3strong, FM2strong, V3strong) ge 2 or
sum(R1strong, EP1strong, PL1strong, DC3strong, FM2strong, V4strong) ge 2
AND VALUE = 1 then TIPlifenum = 1;

 

I think you might have intended for the ORs to happen before the AND.  If so, you'll need parentheses.

 

else if sum(R1strong, EP1strong, PL1strong, DC3strong, FM1strong, V3strong) ge 2 OR
sum (R1strong, EP1strong, PL1strong, DC3strong, FM1strong, V4strong) ge 2 OR
sum(R1strong, EP1strong, PL1strong, DC3strong, FM2strong, V3strong) ge 2 OR
sum(R1strong, EP1strong, PL1strong, DC3strong, FM2strong, V4strong) ge 2
and VALUE = 1 then TIPlifenum = 1;

 

https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/lepg/p1n8bsqqd03xppn17pgvjpjlbhhs.htm#p1ces81...

 

ab81
Calcite | Level 5
Ah, that is good to know - and what I was concerned about - wondering if the AND canceled out all the OR statements from the prior lines of code.
ab81
Calcite | Level 5
No, the dataset is currently < 450 observations - just figured there was an easier/less verbose way to write this code.
ballardw
Super User

What type of values are in that plethora of variables? Are they all 0/1 coded ?

 

If so, you might consider evaluating some of the repeated things such as
sum (R1strong, EP1strong, PL1strong, DC3strong). If that is ge 2 then the values of the other variables don't matter.

 

With 1/0 coded variables then MAX(of list)=1 tells you that ate least one of the variables is 1.

ab81
Calcite | Level 5
Yes, each of the other type of variables in that gignatic list are coded 0/1 too.
FreelanceReinh
Jade | Level 19

Hello @ab81,

 

In addition to the missing parentheses around the Boolean expressions combined with OR (as pointed out by @PhilC) you missed variable DC3strong in two of the last four sums. It seems that @PhilC has copied this mistake into his code.

 


@ab81 wrote:

(...) see if there is a better way to do this, perhaps using an array of some kind (?)


Yes, you can use arrays to abbreviate repetitive lists of variables:

data want;
set have;
array _a[*] R1strong EP1strong PL1strong DC3strong;
array _b[*] FM1strong FM2strong;
array _c[*] V3strong V4strong;
TIPlifenum=
  (FM3strong = 1 
     | sum(0, of _a[*], max(0, of _b[*]), max(0, of _c[*]))>=2
     | sum(0, of _a[*], of _b[*], of _c[*])>=1 & sum(0, EP5med, EP6med, EP7med, FM4med, V8med)>=3)
  & (VALUE = 1 | LABOR = 1);
run;

This assumes 0-1 coded variables, but also allows for missing values. If you are sure that no missing values occur, you can delete the zeros in the SUM and MAX function arguments. Their only purpose is to avoid notes "Missing value were generated ..." in the log. They do not change the results.

 

In general, missing values can be harmful. For example, the expression

 

sum_strong0 + DC3strong

 

(contained in @PhilC's code) would be missing if sum_strong0 was missing and thus lead to ignoring a valid value DC3strong=1.

 

I used this input dataset HAVE and then PROC COMPARE on the "WANT" datasets to compare the suggested solutions:

data have;
do FM3strong = .,0,1;
do VALUE     = .,0,1;
do LABOR     = .,0,1;
do R1strong  = .,0,1;
do EP1strong = .,0,1;
do PL1strong = .,0,1;
do DC3strong = .,0,1;
do FM1strong = .,0,1;
do FM2strong = .,0,1;
do V3strong  = .,0,1;
do V4strong  = .,0,1;
do EP5med    = .,0,1;
do EP6med    = .,0,1;
do EP7med    = .,0,1;
do FM4med    = .,0,1;
do V8med     = .,0,1;
  output;
end; end; end; end; end; end; end; end; end; end; end; end; end; end; end; end;
run; /* 3**16=43046721 obs. */

 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1439 views
  • 0 likes
  • 4 in conversation