BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
anissak1
Obsidian | Level 7

Hello - I have written the code below for a dataset.  I'm creating new variables based on existing variables.

When I run this code, there is one subgroup for whom the walkcensordate variable (a new variable) does not get created properly.  Rather, it is assigned another variable value - censor date - which is not in the code below. 

When I run the code on the entire dataset, it works perfectly fine except for in the problematic subgroup.  And if I isolate a single patient from this subgroup, the code also runs fine.

Any ideas on how to troubleshoot?  The subgroups to which I refer were created at the same time - not from a merge. And the problematic variable did not exist previously. Is there another way I could approach writing my code?

 

I feel there is an error on the back end somehow or maybe that I am just going mad. 🙂

 

Thank you!

Anissa

 


data check3;
set check2;
if t25confirmedprogress=1 then walkcensor=1;
else walkcensor=0;
if walkcensor=1 then walkcensordate=first_day;
if term=0 then walkcensordate=termdate;
if walkcensordate='.' then walkcensordate=edsslof;
run;

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Is your Walkcensordate character or numeric.

This code seldom makes sense:

if walkcensordate='.' then walkcensordate=edsslof;

if the variable is character then a missing value is not '.'

If the variable is numeric you likely get a "conversion to numeric" note in the log and may not be comparing the desired value.

 

If you want to check for missing values then you may want to use the function MISSING, which will work for both numeric and character values.

 

if missing(walkcensordate)  then walkcensordate=edsslof;

Other than that, you should provide the values of all the variables you are showing in the code, in the form of a data step, that are not getting assigned correctly for whatever subgroup.

 

It is not impossible that a logic problem when you create check2 added the variable/value you don't want.

 

Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the <> icon or attached as text to show exactly what you have and that we can test code against.

View solution in original post

9 REPLIES 9
ballardw
Super User

Is your Walkcensordate character or numeric.

This code seldom makes sense:

if walkcensordate='.' then walkcensordate=edsslof;

if the variable is character then a missing value is not '.'

If the variable is numeric you likely get a "conversion to numeric" note in the log and may not be comparing the desired value.

 

If you want to check for missing values then you may want to use the function MISSING, which will work for both numeric and character values.

 

if missing(walkcensordate)  then walkcensordate=edsslof;

Other than that, you should provide the values of all the variables you are showing in the code, in the form of a data step, that are not getting assigned correctly for whatever subgroup.

 

It is not impossible that a logic problem when you create check2 added the variable/value you don't want.

 

Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the <> icon or attached as text to show exactly what you have and that we can test code against.

anissak1
Obsidian | Level 7

Thank you for the note! I'll double check on character format and the other steps your noted in your message.  The original dataset came to me from R in csv format and I definitely had some character/numeric formatting issues earlier on (that I thought I had resolved).  Should get to this tonight or tomorrow morning.  My brain is fried from a few hours of troubleshooting code in general and I'm currently worried that I'll introduce more errors. 🙂  My fingers are crossed!...

Tom
Super User Tom
Super User

Does the original R dataframe have missing values? R has a nasty habit of writing character strings into CSV file for the missing values. You should probably pre-process the CSV files to remove those NA strings so they don't get you confused about the variable types.

anissak1
Obsidian | Level 7
Yes! That was an original problem I thought I had fixed. But now I see it did not pull through. May be the result of working on too many datasets simultaneously. 😞 I will recheck and then respond to this thread to hopefully close it out.
Thank you!
ballardw
Super User

Quick check on your SAS data set:

 

Proc Contents data=check2;

run;

to tell you the variable types and such.

 

Proc freq data=check2;

 tables t25confirmedprogress;

 

with a tables statement will tell you values of the variables of interest and how many missing.

 

I'm not sure if you realize that your ELSE will execute when t25confirmedprogress is missing. Since that is the first comparison that might be part of your issue.


Best practice is to provide an example of data we can use in the form of a data step. It should include some cases that have the "problem" and some that don't.

 

Basic rule: quotes only go with Character variable values. If you look at data (print, view table or similar) and see a dot that is very likely the missing value indicator for numeric values. Which would come from a numeric variable with a value like NA in a CSV file.

anissak1
Obsidian | Level 7

I'm not sure what to mark as the correct solution as I was unable to "fix" the code.  To resolve it, I used the original dataset that didn't pull through the one line of code correctly to the subpopulation and wrote a new line of code on a new dataset (below).  This brute force approach did pull through all observations, including the subgroup that didn't "take" before.  I confirmed that all variables were numerical.  And so I am still scratching my head why a certain set of observations would not take the "first_day" value.  But moving on...:)  Thanks to all for your guidance.  Super helpful, as always.


data kaplan8;
set kaplan7;
if t25confirmedprogress=1 then walkcensor=1;
else walkcensor=0;
if walkcensor=1 then walkcensordate=first_day;
if term=0 then walkcensordate=termdate;
if missing (walkcensordate) then walkcensordate=edsslof;
run;

 

data kaplan100;
set kaplan8;
if t25confirmedprogress=1 then walkcensordate=first_day;
run;

ballardw
Super User

You might test instead of

if t25confirmedprogress=1 then walkcensor=1;
else walkcensor=0;
if walkcensor=1 then walkcensordate=first_day;

try

if t25confirmedprogress=1 then do;
   walkcensor=1;
   walkcensordate=first_day;
end;  
else walkcensor=0;
Tom
Super User Tom
Super User

Since you have multiple independent statement that are setting the same variable WALCENSORDATE then order that you run them makes a difference.  If two conditions are both true then the value assigned by the last one "wins".  Adding a little spacing as below makes it is easier to see which blocks of statements are independent.

 

If you want to skip creating the extra data step then just add the extra IF to the first data step.

data kaplan100;
set kaplan7;

if t25confirmedprogress=1 then walkcensor=1;
else walkcensor=0;

if walkcensor=1 then walkcensordate=first_day;

if term=0 then walkcensordate=termdate;

if missing (walkcensordate) then walkcensordate=edsslof;

if t25confirmedprogress=1 then walkcensordate=first_day;
run;

If you want to combine those last 4 into once IF/THEN/ELSE IF... sequence then make sure to reverse the order.

data kaplan100;
set kaplan7;

if t25confirmedprogress=1 then walkcensor=1;
else walkcensor=0;

if t25confirmedprogress=1 then walkcensordate=first_day;
else if missing (walkcensordate) then walkcensordate=edsslof;
else if term=0 then walkcensordate=termdate;
else if walkcensor=1 then walkcensordate=first_day;

run;
anissak1
Obsidian | Level 7

BOTH of these solutions worked.  I wish I could mark them as solutions!!!

I didn't "see" the flaws in the logic of my original code.  So helpful to see these pointed out - and in two different ways.  I'm still not sure why exactly my original code would work for 2100 observations but not 300 of them.  But perhaps I should leave that for a different day.  MUCH appreciation Tom!

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 1867 views
  • 4 likes
  • 3 in conversation