BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Wolverine
Pyrite | Level 9

The fact that I even have to post this tells u what kind of day I'm having 😞 If the HLT_ALZDEM variable doesn't have a valid value of 1 or 2 (yes/no), then the variables that are constructed based on HLT_ALZDEM should be given a value of missing.  However, they sometimes are given a value of 0 instead.

 

Below is a portion of the output file, sorted for rows where HLT_ALZDEM  is missing. Since missing is clearly not 1 or 2, all ADRD_group and non_ADRD_group values should also be missing. But they're not. Obs 9 and 10 are identical except for these 2 variables., when Obs 9 is missing and Obs 10 is 0 for both.

 

The only possible explanation I can think of is that there are non-printable characters for some values of HLT_ALZDEM. But even if there are, those non-printable characters are not 1 or 2. Frequencies for HLT_ALZDEM  only show values of 1, 2, and missing.

IF cohort_flag = 1 AND ADM_MA_FLAG_YR = 3
	THEN MA_group = 1; ELSE MA_group = 0;
IF cohort_flag = 0 THEN MA_group = .;
	
IF cohort_flag = 1 AND ADM_FFS_FLAG_YR = 3 
	THEN TM_group = 1; ELSE TM_group = 0;
IF cohort_flag = 0 THEN TM_group = .;
	
IF cohort_flag = 1 AND HLT_ALZDEM = 1 
	THEN ADRD_group = 1; ELSE ADRD_group = 0;
IF cohort_flag = 0 THEN ADRD_group = .;
IF HLT_ALZDEM ^in (1,2) THEN ADRD_group = .;
	
IF cohort_flag = 1 AND HLT_ALZDEM = 2 
	THEN non_ADRD_group = 1; ELSE non_ADRD_group = 0;
IF cohort_flag = 0 THEN non_ADRD_group = .;
IF HLT_ALZDEM ^in (1,2) THEN non_ADRD_group = .;
Obs HLT_ALZDEM cohort_flag MA_group TM_group ADRD_group non_ADRD_group

1

  0        
2   1 0 1 0 0
3   1 1 0    
4   0        
5   0        
6   1 0 1    
7   1 1 0    
8   1 1 0    
9   1 1 0    
10   1 1 0 0 0
11   1 1 0    
12   1 0 1    
13   1 0 1    
14   1 0 1    
15   1 0 1    
16   1 0 1 0 0
1 ACCEPTED SOLUTION

Accepted Solutions
Wolverine
Pyrite | Level 9

Embarrassing to say, but the solution was in the log the entire time. There was another set of IF-THEN statements for other variables in the same data step. I forgot to add "then" to one of those statements. That caused an error, so the old version of the file (which was based on syntax that did have a logic error) was not overwritten. I fixed the logic error in the syntax, but I kept looking at frequencies based on the old version of the file. I assumed there must still be some logic error, so I kept going down that path instead of checking the basics (like looking for errors in the log!). Psychologists call that "function fixedness".  Friday was not a good day.

View solution in original post

5 REPLIES 5
Patrick
Opal | Level 21

Using the exact data and code you shared I'm getting a different result. Can you please verify and share sample data that matches your code.

You haven't shared SAS log. Are there any notes in it about character to numeric conversion?

 

SAS numerical variables can't store non-printable characters. They could store special missings but none of your tests checks for missings where this could hit you.

And just as a thought: IF there is a custom format applied to your variables that prints certain values as missing then you could be misled. To ensure this is not the case add the following line to your data step.

format HLT_ALZDEM cohort_flag best32.;

 

Here your exact data and code to demonstrate that the outcome is different than what you describe.

data have;
  infile datalines dsd dlm=',' truncover;
  input Obs HLT_ALZDEM cohort_flag src_MA_group src_TM_group src_ADRD_group src_non_ADRD_group;
datalines;
1,,0,,,,
2,,1,0,1,0,0
3,,1,1,0,,
4,,0,,,,
5,,0,,,,
6,,1,0,1,,
7,,1,1,0,,
8,,1,1,0,,
9,,1,1,0,,
10,,1,1,0,0,0
11,,1,1,0,,
12,,1,0,1,,
13,,1,0,1,,
14,,1,0,1,,
15,,1,0,1,,
16,,1,0,1,0,0
;
run;

data want;
  length  Obs HLT_ALZDEM cohort_flag
          src_MA_group MA_group src_TM_group TM_group 
          src_ADRD_group ADRD_group src_non_ADRD_group non_ADRD_group
          8;
  set have;
  IF cohort_flag = 1 AND ADM_MA_FLAG_YR = 3
  	THEN MA_group = 1; ELSE MA_group = 0;
  IF cohort_flag = 0 THEN MA_group = .;
  	
  IF cohort_flag = 1 AND ADM_FFS_FLAG_YR = 3 
  	THEN TM_group = 1; ELSE TM_group = 0;
  IF cohort_flag = 0 THEN TM_group = .;
  	
  IF cohort_flag = 1 AND HLT_ALZDEM = 1 
  	THEN ADRD_group = 1; ELSE ADRD_group = 0;
  IF cohort_flag = 0 THEN ADRD_group = .;
  IF HLT_ALZDEM ^in (1,2) THEN ADRD_group = .;
  	
  IF cohort_flag = 1 AND HLT_ALZDEM = 2 
  	THEN non_ADRD_group = 1; ELSE non_ADRD_group = 0;
  IF cohort_flag = 0 THEN non_ADRD_group = .;
  IF HLT_ALZDEM ^in (1,2) THEN non_ADRD_group = .;
run;

proc print data=want;
run;

Patrick_0-1714786188632.png

 

And below not changing your logic but ordering the conditions a bit differently so they become easier to read and maintain:

data want;
  length  Obs HLT_ALZDEM cohort_flag
          src_MA_group MA_group src_TM_group TM_group 
          src_ADRD_group ADRD_group src_non_ADRD_group non_ADRD_group
          8;
  set have;

  IF cohort_flag = 0 THEN MA_group = .;
  else IF cohort_flag = 1 AND ADM_MA_FLAG_YR = 3 THEN MA_group = 1; 
  ELSE MA_group = 0;
  	
  IF cohort_flag = 0 THEN TM_group = .;
  else IF cohort_flag = 1 AND ADM_FFS_FLAG_YR = 3 THEN TM_group = 1; 
  ELSE TM_group = 0;
  	
  IF HLT_ALZDEM ^in (1,2) THEN ADRD_group = .;
  else IF cohort_flag = 0 THEN ADRD_group = .;
  else IF cohort_flag = 1 AND HLT_ALZDEM = 1 THEN ADRD_group = 1; 
  ELSE ADRD_group = 0;
  	
  IF HLT_ALZDEM ^in (1,2) THEN non_ADRD_group = .;
  else IF cohort_flag = 0 THEN non_ADRD_group = .;
  else IF cohort_flag = 1 AND HLT_ALZDEM = 2 THEN non_ADRD_group = 1; 
  ELSE non_ADRD_group = 0;
run;

 

Wolverine
Pyrite | Level 9

Embarrassing to say, but the solution was in the log the entire time. There was another set of IF-THEN statements for other variables in the same data step. I forgot to add "then" to one of those statements. That caused an error, so the old version of the file (which was based on syntax that did have a logic error) was not overwritten. I fixed the logic error in the syntax, but I kept looking at frequencies based on the old version of the file. I assumed there must still be some logic error, so I kept going down that path instead of checking the basics (like looking for errors in the log!). Psychologists call that "function fixedness".  Friday was not a good day.

mkeintz
PROC Star

@Wolverine 

 

Don't be shy.  Mark your explanation as the solution, so we all know the problem has been solved.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Astounding
PROC Star

The usual place to begin would be to examine the data.  In particular take a look at COHORT_FLAG and see if it contains unusual values.  They don't have to be unprintable characters.  They could be values of 2 or 3 instead of what you expect.  So just run a PROC FREQ to be sure.  

 

Consider how you set up your logic:

 

IF cohort_flag = 1 AND HLT_ALZDEM = 1 
	THEN ADRD_group = 1; ELSE ADRD_group = 0;
IF cohort_flag = 0 THEN ADRD_group = .;
IF HLT_ALZDEM ^in (1,2) THEN ADRD_group = .;

This logic could easily give you a value of 0 for ADRD_group when cohort_flag is 3.  Remember, the ELSE statement applies to either of these conditions:

  • COHORT_FLAG is not 1
  • HLT_ALZDEM is not 1

That could be a logic problem resolved by checking the data.  In fact, a general approach to solving this would be to take some observations that you believe are coming out incorrectly, and examine the incoming data values for that observation.

FreelanceReinh
Jade | Level 19

What I like to do in such cases is to create a list or cross-tabulation of all possible combinations, where the "possibilities" only go as far as the IF/ELSE conditions distinguish them. According to your code, the four new variables are derived independently of each other, each depending on just two existing variables, so cross-tabulations are feasible.

 

Cross-tabulations for MA_group (analogous for TM_group) and ADRD_group (similar for non_ADRD_group):

MA_group    | ADM_MA_FLAG_YR
cohort_flag |   3   |  ne 3
------------+-------+--------
         0  |   .   |    .
------------+-------+--------
         1  |   1   |    0
------------+-------+--------
 ^in (0,1)  |   0   |    0
-----------------------------
ADRD_group  |        HLT_ALZDEM
cohort_flag |   1   |   2   | ^in (1,2)
------------+-------+-------+-----------
         0  |   .   |   .   |     .
------------+-------+-------+-----------
         1  |   1   |   0   |     .
------------+-------+-------+-----------
 ^in (0,1)  |   0   |   0   |     .
----------------------------------------

 

I derived these cross-tabulations from your code. You would better use the specifications as a basis for them and then start writing the code according to the cross-tabulations.

 

If the above cross-tabulations and those for TM_group (with ADM_FFS_FLAG_YR replacing ADM_MA_FLAG_YR) and non_ADRD_group (with "1 | 0" switched to "0 | 1") are in accordance with the specifications, the code could be simplified to:

data want(drop=_c);
set have;
if cohort_flag ne 0 then do;
  _c=(cohort_flag=1);
  MA_group=(_c & ADM_MA_FLAG_YR =3);
  TM_group=(_c & ADM_FFS_FLAG_YR=3);
  if HLT_ALZDEM in (1,2) then do;
        ADRD_group=(_c & HLT_ALZDEM=1);
    non_ADRD_group=(_c & HLT_ALZDEM=2);
  end;
end;
run;

Important: This assumes that the new variables are not contained in dataset HAVE. Otherwise, they must be dropped from HAVE with a DROP= dataset option (or initialized to missing after the SET statement). Also, all variables involved are assumed to be numeric.

 

The derived variables are either set to 1 (for TRUE) or 0 (for FALSE) resulting from a Boolean expression such as _c & HLT_ALZDEM=1 (i.e., cohort_flag=1 & HLT_ALZDEM=1or they are left missing if an IF condition prevents the execution of the assignment statement.

 

If you are in doubt about the content of a numeric variable, use the HEX16. format.

 

Example log:

378   data _null_;
379   have=9.9/3.3;
380   want=3;
381   put (have want)(=hex16./);
382   run;

have=4008000000000001
want=4008000000000000

 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 860 views
  • 6 likes
  • 5 in conversation