@u58469353 wrote:
Thanks for your reply. I am completely new to SAS coding. So honestly I'm not sure how to perform the suggestions you are telling me about. When I used the code I just posted, it does change the variables to numeric and I have no issues with completing to multiple linear regression. At first ,the dataset I imported did contain 3 variables that showed up as characters rather than numeric, and they are causing me problems.
Again, you miss the point. Region is not numeric, it is categorical, and so it is meaningless to convert it to a numeric and run a regression with a numeric region. Converting a categorical variable to numeric makes no sense from a logical, mathematical or statistical understanding. Just because you can convert it to numeric does not mean you SHOULD convert it to numeric, and does not mean you will get useful or correct answers. Don't blame this on being new to SAS, this is not a SAS issue at all, this is an issue where either you understand the data and the needs of the analysis and make correct decisions on how to use it, or you don't understand the data and don't understand the needs of the analysis, and make incorrect decisions on how to use it.
You need to keep region as a category and either create dummy variables for PROC REG, or better yet, go ahead and switch to PROC GLM and put region and gender in a CLASS statement, as stated above.
One more advice: for your future benefit in getting quick and meaningful answers, don't tell us you have a PROC MEANS issue if you are doing a regression. Tell us you are doing a regression. If you had mentioned regression in your very first message, you probably would have gotten the PROC GLM advice in the very first reply.
How would I ensure that the data types are read in correctly when importing the dataset?
/* Generated Code (IMPORT) */ /* Source File: MIS445TelcoExtraCSU-Global.xlsx */ /* Source Path: /home/u58469353/MIS445 */ /* Code generated on: 7/26/21, 1:26 PM */ %web_drop_table(WORK.IMPORT); FILENAME REFFILE '/home/u58469353/MIS445/MIS445TelcoExtraCSU-Global.xlsx'; PROC IMPORT DATAFILE=REFFILE DBMS=XLSX OUT=WORK.IMPORT; GETNAMES=YES; RUN; PROC CONTENTS DATA=WORK.IMPORT; RUN; %web_open_table(WORK.IMPORT);
This is the code I used to import the dataset.
Excel is notorious for creating files that Proc Import does not use as intended because there are no constraints on anything.
You may have better luck opening the XLSX file in spread sheet software and then saving it as Comma Separated Values (CSV) and import that file which will be text. Then the generated data step code in the log can be copied to the editor, cleaned up (remove line numbers), and set informats for specific variables to match expected types. If the generated informat is $xx then you have a character variable. If the value should be numeric change the $xx. to something like f8. ;
Proc Import expects pretty clean data, simple columns of similar values, no more than one header row - the very first row- to create names. XLSX files from way too many sources have multiple header rows with the column headings actually on rows other than 1, pictures, mixes of values (NA for "not available" or similar in a column that should be numeric - common source of character variables) and values from formula cells can sometimes be entertaining.
Sometimes if the structure of the spreadsheet is obnoxious but consistent a data step may be able to parse row values to change how specific rows of the text are handled. Proc Import does not have any way to do so.
@u58469353 wrote:
/* Generated Code (IMPORT) */ /* Source File: MIS445TelcoExtraCSU-Global.xlsx */ /* Source Path: /home/u58469353/MIS445 */ /* Code generated on: 7/26/21, 1:26 PM */ %web_drop_table(WORK.IMPORT); FILENAME REFFILE '/home/u58469353/MIS445/MIS445TelcoExtraCSU-Global.xlsx'; PROC IMPORT DATAFILE=REFFILE DBMS=XLSX OUT=WORK.IMPORT; GETNAMES=YES; RUN; PROC CONTENTS DATA=WORK.IMPORT; RUN; %web_open_table(WORK.IMPORT);This is the code I used to import the dataset.
That isn't the code from the log - that's the code you need.
If you use @Toms code for conversion you're ignoring that fact that Region needs three dummy variables not one. I suppose it's possible your instructor wants you to do that but I highly suspect not. If you did any topics on modeling with categorical predictors review that section.
1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
68
69 /* Generated Code (IMPORT) */
70 /* Source File: MIS445TelcoExtraCSU-Global.xlsx */
71 /* Source Path: /home/u58469353/MIS445 */
72 /* Code generated on: 7/26/21, 1:26 PM */
73
74 %web_drop_table(WORK.IMPORT);
75
76
77 FILENAME REFFILE
77 ! '/home/u58469353/MIS445/MIS445TelcoExtraCSU-Global.xlsx';
78
79 PROC IMPORT DATAFILE=REFFILE
80 DBMS=XLSX
81 OUT=WORK.IMPORT;
82 GETNAMES=YES;
83 RUN;
NOTE: One or more variables were converted because the data type is not
supported by the V9 engine. For more details, run with options
MSGLEVEL=I.
NOTE: The import data set has 1000 observations and 34 variables.
NOTE: WORK.IMPORT data set was successfully created.
NOTE: PROCEDURE IMPORT used (Total process time):
real time 0.30 seconds
user cpu time 0.30 seconds
system cpu time 0.00 seconds
memory 3160.87k
OS Memory 28580.00k
Timestamp 07/26/2021 05:26:59 PM
Step Count 121 Switch Count 2
Page Faults 0
Page Reclaims 629
Page Swaps 0
Voluntary Context Switches 18
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 776
84
85 PROC CONTENTS DATA=WORK.IMPORT; RUN;
NOTE: PROCEDURE CONTENTS used (Total process time):
real time 0.06 seconds
user cpu time 0.07 seconds
system cpu time 0.00 seconds
memory 2009.75k
OS Memory 27176.00k
Timestamp 07/26/2021 05:26:59 PM
Step Count 122 Switch Count 0
Page Faults 0
Page Reclaims 141
Page Swaps 0
Voluntary Context Switches 0
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 32
86
87
88 %web_open_table(WORK.IMPORT);
89
90 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
100
Here is the code from the log. Sorry about that. My instructor did mention in email that I need to use dummy variables, but we never learned how to do that in this course.
@u58469353 wrote:
So I need these variables to be in numeric form so that I can perform multiple linear regression using these variables. I'm still encountering the problem when I use this code to change a variable to numeric form.
data new; set work.import; gender_numeric = input(gender,2.); data new; set work.import; region_numeric = input(region,4.); data new; set work.import; marital_numeric = input(marital,2.);
The issue I run into is that only the last variable I enter shows up in the dataset as a numeric variable. In this case, the marital_numeric variable appears. But the region_numeric and the gender_numeric variables are not showing up. However, if I only do one at a time I am able to use them for graphs and regressions. But I need all 3 of them together to perform a multiple linear regression model.
It should be pretty obvious from the SAS log what is wrong with that code. You keep overwriting the dataset NEW with a different version so only the last version is available for further use. Just put all of the assignment statements in one data step. Also get in the habit of explicitly ending your steps. It will reduce your confusion.
data new;
set work.import;
gender_numeric = input(gender,2.);
region_numeric = input(region,4.);
marital_numeric = input(marital,2.);
run;
Thank you so much. I am trying to learn. I appreciate it
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.