Hello - I have written the code below for a dataset. I'm creating new variables based on existing variables.
When I run this code, there is one subgroup for whom the walkcensordate variable (a new variable) does not get created properly. Rather, it is assigned another variable value - censor date - which is not in the code below.
When I run the code on the entire dataset, it works perfectly fine except for in the problematic subgroup. And if I isolate a single patient from this subgroup, the code also runs fine.
Any ideas on how to troubleshoot? The subgroups to which I refer were created at the same time - not from a merge. And the problematic variable did not exist previously. Is there another way I could approach writing my code?
I feel there is an error on the back end somehow or maybe that I am just going mad. 🙂
Thank you!
Anissa
data check3;
set check2;
if t25confirmedprogress=1 then walkcensor=1;
else walkcensor=0;
if walkcensor=1 then walkcensordate=first_day;
if term=0 then walkcensordate=termdate;
if walkcensordate='.' then walkcensordate=edsslof;
run;
Is your Walkcensordate character or numeric.
This code seldom makes sense:
if walkcensordate='.' then walkcensordate=edsslof;
if the variable is character then a missing value is not '.'
If the variable is numeric you likely get a "conversion to numeric" note in the log and may not be comparing the desired value.
If you want to check for missing values then you may want to use the function MISSING, which will work for both numeric and character values.
if missing(walkcensordate) then walkcensordate=edsslof;
Other than that, you should provide the values of all the variables you are showing in the code, in the form of a data step, that are not getting assigned correctly for whatever subgroup.
It is not impossible that a logic problem when you create check2 added the variable/value you don't want.
Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the <> icon or attached as text to show exactly what you have and that we can test code against.
Is your Walkcensordate character or numeric.
This code seldom makes sense:
if walkcensordate='.' then walkcensordate=edsslof;
if the variable is character then a missing value is not '.'
If the variable is numeric you likely get a "conversion to numeric" note in the log and may not be comparing the desired value.
If you want to check for missing values then you may want to use the function MISSING, which will work for both numeric and character values.
if missing(walkcensordate) then walkcensordate=edsslof;
Other than that, you should provide the values of all the variables you are showing in the code, in the form of a data step, that are not getting assigned correctly for whatever subgroup.
It is not impossible that a logic problem when you create check2 added the variable/value you don't want.
Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the <> icon or attached as text to show exactly what you have and that we can test code against.
Thank you for the note! I'll double check on character format and the other steps your noted in your message. The original dataset came to me from R in csv format and I definitely had some character/numeric formatting issues earlier on (that I thought I had resolved). Should get to this tonight or tomorrow morning. My brain is fried from a few hours of troubleshooting code in general and I'm currently worried that I'll introduce more errors. 🙂 My fingers are crossed!...
Does the original R dataframe have missing values? R has a nasty habit of writing character strings into CSV file for the missing values. You should probably pre-process the CSV files to remove those NA strings so they don't get you confused about the variable types.
Quick check on your SAS data set:
Proc Contents data=check2;
run;
to tell you the variable types and such.
Proc freq data=check2;
tables t25confirmedprogress;
with a tables statement will tell you values of the variables of interest and how many missing.
I'm not sure if you realize that your ELSE will execute when t25confirmedprogress is missing. Since that is the first comparison that might be part of your issue.
Best practice is to provide an example of data we can use in the form of a data step. It should include some cases that have the "problem" and some that don't.
Basic rule: quotes only go with Character variable values. If you look at data (print, view table or similar) and see a dot that is very likely the missing value indicator for numeric values. Which would come from a numeric variable with a value like NA in a CSV file.
I'm not sure what to mark as the correct solution as I was unable to "fix" the code. To resolve it, I used the original dataset that didn't pull through the one line of code correctly to the subpopulation and wrote a new line of code on a new dataset (below). This brute force approach did pull through all observations, including the subgroup that didn't "take" before. I confirmed that all variables were numerical. And so I am still scratching my head why a certain set of observations would not take the "first_day" value. But moving on...:) Thanks to all for your guidance. Super helpful, as always.
data kaplan8;
set kaplan7;
if t25confirmedprogress=1 then walkcensor=1;
else walkcensor=0;
if walkcensor=1 then walkcensordate=first_day;
if term=0 then walkcensordate=termdate;
if missing (walkcensordate) then walkcensordate=edsslof;
run;
data kaplan100;
set kaplan8;
if t25confirmedprogress=1 then walkcensordate=first_day;
run;
You might test instead of
if t25confirmedprogress=1 then walkcensor=1; else walkcensor=0; if walkcensor=1 then walkcensordate=first_day;
try
if t25confirmedprogress=1 then do; walkcensor=1; walkcensordate=first_day; end; else walkcensor=0;
Since you have multiple independent statement that are setting the same variable WALCENSORDATE then order that you run them makes a difference. If two conditions are both true then the value assigned by the last one "wins". Adding a little spacing as below makes it is easier to see which blocks of statements are independent.
If you want to skip creating the extra data step then just add the extra IF to the first data step.
data kaplan100;
set kaplan7;
if t25confirmedprogress=1 then walkcensor=1;
else walkcensor=0;
if walkcensor=1 then walkcensordate=first_day;
if term=0 then walkcensordate=termdate;
if missing (walkcensordate) then walkcensordate=edsslof;
if t25confirmedprogress=1 then walkcensordate=first_day;
run;
If you want to combine those last 4 into once IF/THEN/ELSE IF... sequence then make sure to reverse the order.
data kaplan100;
set kaplan7;
if t25confirmedprogress=1 then walkcensor=1;
else walkcensor=0;
if t25confirmedprogress=1 then walkcensordate=first_day;
else if missing (walkcensordate) then walkcensordate=edsslof;
else if term=0 then walkcensordate=termdate;
else if walkcensor=1 then walkcensordate=first_day;
run;
BOTH of these solutions worked. I wish I could mark them as solutions!!!
I didn't "see" the flaws in the logic of my original code. So helpful to see these pointed out - and in two different ways. I'm still not sure why exactly my original code would work for 2100 observations but not 300 of them. But perhaps I should leave that for a different day. MUCH appreciation Tom!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.