I am wanting to categorize data within 2 ranges, what happens is they all result in missing or the missing data is wrongfully categorized. this is my code:
data new;
set old;
if WHt='.' then Obese1=.;
else if Wht< 30 then Obese1="N";
else Obese1="Y";
run;
results in this warning over and over again and every row in the data under Obese1 resulting with '.'
When I rearrange the code:
if Wht < 30 then Obese1="N"; else if Wht='.' then Obese1=.;
else Obese1="Y";
there are no warnings or errors, BUT, the the missing 'Wht" data are categorized as "N" in the Obese1 column.
what is happening here?
Your first statement starts you off on the wrong path. Using a . as the value is the correct way to refer to a missing value for a numeric variable. However, since you want ObesePrior to be a character value taking on values like "Y" and "N", change the code to make it a character variable:
if missing(BMI) then ObesePrior=" ";
else if BMI < 30 then ObesePrior="N";
else ObesePrior= 'Y';
@Cooksam13 wrote:
I am wanting to categorize data within 2 ranges, what happens is they all result in missing or the missing data is wrongfully categorized. this is my code:
data new;
set old;
if WHt='.' then Obese1=.;
else if Wht< 30 then Obese1="N";
else Obese1="Y";run;
results in this warning over and over again and every row in the data under Obese1 resulting with '.'
NOTE: Character values have been converted to numeric values at the places given by: (Line):(Column).103:17 103:34 107:9 108:34 109:18 118:4NOTE: Variable un is uninitialized.NOTE: Invalid numeric data, 'N' , at line 108 column 34.race_eth=Hispanic (any race) ID=2012000176 MotherHeight=05:07 PriorWeight=170 PREV_ALIVE=2 RFDiabGest= EstGest=39 EstGestOb=EstGestClin=True Plurality=1 MatEthnicity=210 MatRace=01 M_AGE=31 GestDiab=N prev_alive2=2 Ht_Inches=67 PRiorweight2=170BMI=26.698596569 ObesePrior=. un=. Preterm=N _ERROR_=1 _N_=1NOTE: Invalid numeric data, 'Y' , at line 109 column 18.race_eth=Hispanic (any race) ID=2012000324 MotherHeight=05:02 PriorWeight=220 PREV_ALIVE=1 RFDiabGest=True EstGest=38 EstGestOb=TrueEstGestClin= Plurality=1 MatEthnicity=280 MatRace=01 M_AGE=32 GestDiab=Y prev_alive2=1 Ht_Inches=62 PRiorweight2=220BMI=40.348595213 ObesePrior=. un=. Preterm=N _ERROR_=1 _N_=2When I rearrange the code:
if Wht < 30 then Obese1="N"; else if Wht='.' then Obese1=.;
else Obese1="Y";
there are no warnings or errors, BUT, the the missing 'Wht" data are categorized as "N" in the Obese1 column.
what is happening here?
Copy the entire log including the data step code when you have questions about anything in the log.
In this case: WHT does not exist in your data. I know that because this error message:
NOTE: Invalid numeric data, 'N' , at line 108 column 34. race_eth=Hispanic (any race) ID=2012000176 MotherHeight=05:07 PriorWeight=170 PREV_ALIVE=2 RFDiabGest= EstGest=39 EstGestOb= EstGestClin=True Plurality=1 MatEthnicity=210 MatRace=01 M_AGE=31 GestDiab=N prev_alive2=2 Ht_Inches=67 PRiorweight2=170 BMI=26.698596569 ObesePrior=. un=. Preterm=N _ERROR_=1 _N_=1
includes the values of every single variable assigned at the time the error occurs. You also have no variable named Obese1.
Your code does not show a variable UN used, but the cause would be listing a variable in a statement but not assigning a value to it.
Do not test numeric variable for missing with a '.' , quoted period. That is a character variable. Do not test character variables for missing with '.' either because that is an actual value. Better is to use the MISSING function since it works with both types of variables:
if missing(somevariablename) then do <whatever>;
Missing values are less than any value. So if Var has a missing value : Var < 30 is true.
You cannot assign character values, i.e. 'Y' or 'N' to numeric variables. You can determine the characteristics for your variables by running:
Proc contents data=<yourdatasetname>; run;
If this shows the variable a numeric but you see Y and N, then that means a Format has been assigned that will show up in the Proc Contents. You will need to assign the numeric value associated with the format. At a guess 1 =Y and 0 is N but other codes may be used.
Sorry for the confusion, I changed "BMI" to Wht and "ObesePrior" to Obese1 in the question to appease to my professor in not including exact code
I did your suggestion
if missing(BMI) then ObesePrior=.;
else if BMI < 30 then ObesePrior="N";
else ObesePrior= 'Y';
and I got the same error
Your first statement starts you off on the wrong path. Using a . as the value is the correct way to refer to a missing value for a numeric variable. However, since you want ObesePrior to be a character value taking on values like "Y" and "N", change the code to make it a character variable:
if missing(BMI) then ObesePrior=" ";
else if BMI < 30 then ObesePrior="N";
else ObesePrior= 'Y';
Hello all,
I am new on SAS community. I am preparing for SAS base certification and I came across this question ( I am expected to fix the errors on this code and run the program).
I have used the example someone posted on this community yet I am still getting errors in my code.
This is the code.
data work.lowchol work.highchol;
set sashelp.heart;
if cholesterol lt 200 output work.lowchol;
if cholesterol ge 200 output work.highchol;
if cholesterol is missing output work.misschol;
run;
Probably best to start a new thread. Also, when you get errors in the log, show us the ENTIRE log (not just the error messages).
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.