I can't seem to figure out the problem w/ the syntax below. I get missing values for all variables included in the IF-THEN-DO. I also get missing values for variables AFTER the END statement, including bline_surv_fcau, which is simply an assigned value (i.e., bline_surv_fcauc = 0.9665;)
DATA work.test;
SET nhanes.data_2009_2018_go (OBS=5);
where age_yrs ne .;
where same and race_all in (3,4);
where same and gender=2;
where same and htn_meds=1;
where same and smk_status=1;
where same and ever_told_dm2=1;
putlog "NOTE: START DATA STEP ITERATION";
putlog
race_all=
gender=
htn_meds=
smk_status=
ever_told_dm2=
age_yrs=
cauc_f_age=
cauc_f_age_sq=
cauc_f_tchol=
cauc_f_age_x_tchol=
cauc_f_hdl=
cauc_f_sbp_tx=
cauc_f_smk=
cauc_f_age_x_smk=
cauc_f_ever_told_dm2=
ind_sum_cauc_f=
mean_cauc_f=
bline_surv_fcauc=
ascvd_score_cauc_f=
;
/***********************************************************************************************************/
/* Female, Caucasian */
/***********************************************************************************************************/
/* 1. Female, Caucasian, Treated, Smoker, Diabetic */
/***************************************************/
if
race_all = 3 and
gender = 2 and
htn_meds = 1 and
smk_status = 1 and
ever_told_dm2 = 1
then do;
cauc_f_age = log(age_yrs)*(-29.799);
cauc_f_age_sq = log(age_yrs)**2;
cauc_f_tchol = log(tchol_mg)*13.540;
/* Interaction - age and total cholesterol */
cauc_f_age_x_tchol = log(age_yrs)*log(tchol_mg)*(-3.114);
cauc_f_hdl = log(hdl_mg)*(-13.578);
/* Interaction - age and HDL */
cauc_f_age_x_hdl = log(age_yrs)*log(hdl_mg)*(49.37);
/* Treated SBP */
cauc_f_sbp_tx = log(mean_sbp)*2.019;
/* Smoker */
cauc_f_smk = 1*(7.574);
/* Interaction - age and smoking */
cauc_f_age_x_smk = log(age_yrs)*cauc_f_smk*(-1.665);
/* Diabetic */
cauc_f_ever_told_dm2 = 1*(0.661);
end;
/***********************************************************************************************************/
/* Calculating final variables */
/***********************************************************************************************************/
ind_sum_cauc_f = sum(of cauc_f:);
mean_cauc_f = mean(of cauc_f:);
bline_surv_fcauc = 0.9665;
ascvd_score_cauc_f = 1 - bline_surv_fcauc**(exp(ind_sum_cauc_f - mean_cauc_f));
run;
I'm not sure it has anything to do w/ the IF-THEN-DO. I get the same result even when I replace the IF-THEN-DO w/ a simple IF-THEN (i.e., IF gender = 3 THEN test = age_yrs+10;)
FYI. The syntax is only a subset of the full syntax for the program.
-------------------------------------------------------------------------------------------------------------------------------------------------------
race_all=3 gender=2 htn_meds=1 smk_status=1 ever_told_dm2=1 age_yrs=40 cauc_f_age=. cauc_f_age_sq=. cauc_f_tchol=.cauc_f_age_x_tchol=. cauc_f_hdl=. cauc_f_sbp_tx=. cauc_f_smk=. cauc_f_age_x_smk=. cauc_f_ever_told_dm2=. ind_sum_cauc_f=.mean_cauc_f=. bline_surv_fcauc=. ascvd_score_cauc_f=.cauc_f_age is set to missing despite that fact that the variable used in its calculation is NOT missing (i.e., age_yrs).
proc format;
value $ missfmt ' '="Missing" other="Not Missing";
value nmissfmt .="Missing" other="Not Missing";
run;
proc freq data=sas_comm;
table _numeric_ / missing;
format _numeric_ nmissfmt.;
run;
You cannot take the LOG() of a value that is less than or equal to zero.
If that is supposed to be example data then 1) time to learn to write data step 2) ONLY include variables that are involved with the code shown for calculations, 3) how about only providing the 5 records that pass the selection criteria, 4) post code into a text or code box 5) post complete code. That mess is missing a Proc SQL; and Quit;
@ballardw Sorry about the mess. I've tried following a few approaches to generating and posting sample data, mostly to no avail. For this particular post, I followed these instructions. The first approach didn't produce the correct results, so I used the second method, which I posted about here in January. Do you have a different approach?
And/or, could you please help me better understand what I need to do differently using the second approach? Include a PROC SQL and QUIT? Where?
Next time I will only include the relevant variables and the observations that meet the selection criteria. My apologies.
Just to be clear, I should copy/paste the text (i.e., the data set) into my post, correct?
Thanks again for your help.
Those complicated instructions are just for situations where you might need to send a complicated dataset. But for most problems you only need to share a handful of observations for a handful of variables.
To create a dataset just write a data step that reads in the data from lines of text. Reading in data should be one of the first things you learn how to do to use SAS. For example to create a dataset with three variables you might write a data step like this:
data have;
input name :$20. age sex :$1. ;
cards;
Fred 12 M
;
So to generate the lines to use in the card images (now called "datalines" in the manuals) just do the reverse.
data _null_;
set sashelp.class;
put name age sex ;
run;
Then you can just copy the lines of data that line wrote to the SAS log into the data step to recreate the data.
data have;
input name :$20. age sex :$1. ;
cards;
Alfred 14 M
Alice 13 F
Barbara 13 F
Carol 14 F
Henry 14 M
James 12 M
Jane 12 F
Janet 15 F
Jeffrey 13 M
John 12 M
Joyce 11 F
Judy 14 F
Louise 12 F
Mary 15 F
Philip 16 M
Robert 12 M
Ronald 15 M
Thomas 11 M
William 15 M
;
There are some things that will make that harder. Empty character values (just fix those by editing the lines copied from the long and insert a period just like the missing numeric values) . Embedded spaces in character values (you could just pick example values to use that don't have the embedded spaces). Variables that require formats so the values print in a way that humans can understand (like DATE, TIME or DATETIME values) which might require that you include a FORMAT statement in the data step to recreate the sample data.
@_maldini_ wrote:
And/or, could you please help me better understand what I need to do differently using the second approach? Include a PROC SQL and QUIT? Where?
SAS Proc SQL uses "standard" ANSI SQL syntax but needs to be told you are using it.
proc sql; Create table sas_comm(seqn float, ever_used float, age_first_used float, monthly_yr float, age_monthly_use float, time_since_monthly_use float, time_since_monthly_use_unit float, times_month float, joints_pipes_day float, last_used float, last_use_unit float, days_past_month float, ever_told_chf float, age_told_chf float, ever_told_chd float, age_told_chd float, ever_told_angina float, age_told_angina float, ever_told_mi float, age_told_mi float, ever_told_stroke float, age_told_stroke float, ever_told_dm2 float, year float, gender float, age_yrs float, age_mos float, race_hisp_origin float, race_all float, educ_6_19 float, educ_gtet_20 float, mar_status float, wtint2yr float, wtmec2yr float, sdmvpsu float, sdmvstra float, hh_income float, family_income float, ever_etoh float, often_etoh float, etoh_days_per_wk float, etoh_use float, smk100_life float, start_smk_age float, smk_currently float, smk100_life_re float, smk_status float, flag_smk_status float, hdl_mg float, hdl_mmol float, tchol_mg float, tchol_mmol float, wtsaf2yr float, trigly_mg float, trigly_mmol float, ldl_mg float, ldl_mmol float, status_code float, weight float, weight_comment float, height float, height_comment float, bmi float, bp_comment float, sbp_1 float, dbp_1 float, sbp_2 float, dbp_2 float, sbp_3 float, dbp_3 float, sbp_4_if float, dbp_4_if float, mean_sbp float, mean_dbp float, htn float, rxddci1a_21 float, rxddci1a_22 float, rxddci1b_21 float, rxddci1b_22 float, ace float, bb float, ccb float, diur float, arbs float, htn_combo float, lipid_meds float, other float, htn_meds float, wtmec2yr_17_18 float, mos_since_last_use float, yrs_since_last_use float, mos_since_monthly_use float, yrs_since_monthly_use float, cann_use_status float, flag_cann_use_status float, cann_use_status2 float, flag_cann_use_status2 float, yrs_monthly_use float, flag_yrs_monthly_use float, pre_50_monthly_use float, age_first_used_cat float, flag_1 float, flag_2 float, flag_3 float, flag_4 float, flag_5 float, flag_6 float, non_hdl float, least_12_yr float, least_12_life float, often_past_yr float, often_past_yr_units float, etoh_days_per_wk_cont float, wtmec2yr_15_16 float, bp_status float, rxddci1a_23 float, rxddci1b_23 float, wtmec2yr_13_14 float, wtmec2yr_11_12 float, wtmec2yr_09_10 float, wtmec10yr_09_18 float); Insert into sas_comm(seqn, ever_used, age_first_used, monthly_yr, age_monthly_use, time_since_monthly_use, time_since_monthly_use_unit, times_month, joints_pipes_day, last_used, last_use_unit, days_past_month, ever_told_chf, age_told_chf, ever_told_chd, age_told_chd, ever_told_angina, age_told_angina, ever_told_mi, age_told_mi, ever_told_stroke, age_told_stroke, ever_told_dm2, year, gender, age_yrs, age_mos, race_hisp_origin, race_all, educ_6_19, educ_gtet_20, mar_status, wtint2yr, wtmec2yr, sdmvpsu, sdmvstra, hh_income, family_income, ever_etoh, often_etoh, etoh_days_per_wk, etoh_use, smk100_life, start_smk_age, smk_currently, smk100_life_re, smk_status, flag_smk_status, hdl_mg, hdl_mmol, tchol_mg, tchol_mmol, wtsaf2yr, trigly_mg, trigly_mmol, ldl_mg, ldl_mmol, status_code, weight, weight_comment, height, height_comment, bmi, bp_comment, sbp_1, dbp_1, sbp_2, dbp_2, sbp_3, dbp_3, sbp_4_if, dbp_4_if, mean_sbp, mean_dbp, htn, rxddci1a_21, rxddci1a_22, rxddci1b_21, rxddci1b_22, ace, bb, ccb, diur, arbs, htn_combo, lipid_meds, other, htn_meds, wtmec2yr_17_18, mos_since_last_use, yrs_since_last_use, mos_since_monthly_use, yrs_since_monthly_use, cann_use_status, flag_cann_use_status, cann_use_status2, flag_cann_use_status2, yrs_monthly_use, flag_yrs_monthly_use, pre_50_monthly_use, age_first_used_cat, flag_1, flag_2, flag_3, flag_4, flag_5, flag_6, non_hdl, least_12_yr, least_12_life, often_past_yr, often_past_yr_units, etoh_days_per_wk_cont, wtmec2yr_15_16, bp_status, rxddci1a_23, rxddci1b_23, wtmec2yr_13_14, wtmec2yr_11_12, wtmec2yr_09_10, wtmec10yr_09_18) Values(51624, 1, 17, 2, ., ., ., ., ., 17, 4, ., 2, ., 2, ., 2, ., 2, ., 2, ., 2, ., ., ., ., ., ., ., ., ., ., ., ., ., ., ., 1, ., 0, 2, 1, 18, 3, 0, 2, 0, 50, 1.29, 135, 3.49, ., ., ., ., ., 1, 87.4, ., 164.7, ., 32.22, ., 114, 88, 114, 88, 112, 82, ., ., 113.333, 86.000, 1, ., ., ., ., 2, 2, 2, 2, 2, 2, 2, 2, 2, ., 204, 17, ., ., 2, 0, 2, 0, ., 1, 0, 0, 1, 0, 0, 0, 0, 1, ., 1, ., 0, ., 0.00000, ., 1, ., ., ., ., ., .); quit;
The Proc, as is typical in SAS comes as the first thing, then statements. The procedure ends with the Quit; statement.
Note the text box for the code. One thing that can be very important with data step or similar code is that the forum software will reformat text pasted into the main message window and sometimes results in code that generates errors when we copy/paste into a SAS session and attempt to run the code.
Reducing the variables to just what is needed might be the solution for the problems you might have encountered with the create data. Since it is creating text you might not see everything clearly when lots of not-needed-for-this-question variables are included.
Note that there is another form of INPUT that is called name input that can take text such as your PUTLOG output, though use of character variables may get tricky:
data example; input race_all= gender= htn_meds= smk_status= ever_told_dm2= age_yrs= ; datalines; race_all=3 gender=2 htn_meds=1 smk_status=1 ever_told_dm2=1 age_yrs=40 race_all=4 gender=2 htn_meds=1 smk_status=1 ever_told_dm2=1 age_yrs=45 ;
The = after the variable name says that SAS expects to see values with the variable name followed by an = sign then the value in the data source.
Hint: when using something like PUT or PUTLOG to examine results it is a good idea to include the dependent variables used in the calculations such as your tchol_mg and hdl_mg variables. If you find that those variables are missing then you should expect to get missing results for the calculated variables.
Also Putting calculated variables such as cauc_f_age before they are calculated is a waste of electrons. Put those variables AFTER the statements that should calculate them.
I don't understand what your code is trying to do but the notes in the LOG are clearly showing where in the code you are trying to do arithmetic on missing values.
NOTE: Missing values were generated as a result of performing an operation on missing values. Each place is given by: (Number of times) at (Line):(Column). 2 at 123:17 2 at 123:30 2 at 126:35 2 at 126:36 2 at 128:15 2 at 128:26 2 at 131:33 2 at 131:34 2 at 151:19 2 at 152:16 2 at 156:25 2 at 156:43 2 at 156:46 2 at 156:65 NOTE: There were 5 observations read from the data set NHANES.DATA_2009_2018_GO.
The first number, which is 2 in every case, is the number of times that missing values were found. So looks like it might be that 2 of the 5 selected observations have some missing values.
The number before the colon is the line number in the SAS log. The number after the colon is the position on that line where the missing value was detected.
So the first one is this line in the LOG.
123 cauc_f_tchol = log(tchol_mg)*13.540;
So you have 2 observations with missing TCHOL_MG values.
@Tom Thanks for this helpful explanation in how to interpret: "2 at 123:17".
Just to be clear, the 123 refers to the line number in the log, not the SAS program, correct?
Also, is 17 is the number of spaces from the first character in the syntax to the first character of the name of the variable that has a missing value?
From the log:
125 cauc_f_tchol = log(tchol_mg)*13.540;
I count 20 spaces from the "c" in "cauc_f_tchol" to the "t" in "tchol_mg"?
Yes the line number is the number in the log. If you are using SAS interactively it could hundreds or thousands of lines different than the line number in the source file or editor window.
It is the LOG() function that is return the missing value. So 17 is the right position.
When in doubt tell SAS to show the ruler line in the program editor is easier than counting.
@Tom How do I tell SAS to show the ruler line in the program editor? I can't find the answer using a Google search...
I'm using SAS On Demand for Academics.
@Reeza wrote:
Studio doesn't have the ruler AFAIK but it also doesn't always exactly refer to the variable anyways. In this case it refers to the LOG function.
In SAS/Studio use the display at the bottom of the screen with the Line and Column of where the cursor is.
But not in the log with line numbers that the log refers to...
@Tom wrote:
@Reeza wrote:
Studio doesn't have the ruler AFAIK but it also doesn't always exactly refer to the variable anyways. In this case it refers to the LOG function.In SAS/Studio use the display at the bottom of the screen with the Line and Column of where the cursor is.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.