BookmarkSubscribeRSS Feed
Reeza
Super User
From my original post: If you used PROC IMPORT see the code that is generated in the log and modify it according or write your own data step to import the data correctly.

I'm assuming you can look at the code and figure out the difference between the character and numeric variables and change it but if you're having difficulties post the code.
PaigeMiller
Diamond | Level 26

@u58469353 wrote:

Thanks for your reply. I am completely new to SAS coding. So honestly I'm not sure how to perform the suggestions you are telling me about. When I used the code I just posted, it does change the variables to numeric and I have no issues with completing to multiple linear regression. At first ,the dataset I imported did contain 3 variables that showed up as characters rather than numeric, and they are causing me problems.


Again, you miss the point. Region is not numeric, it is categorical, and so it is meaningless to convert it to a numeric and run a regression with a numeric region. Converting a categorical variable to numeric makes no sense from a logical, mathematical or statistical understanding. Just because you can convert it to numeric does not mean you SHOULD convert it to numeric, and does not mean you will get useful or correct answers. Don't blame this on being new to SAS, this is not a SAS issue at all, this is an issue where either you understand the data and the needs of the analysis and make correct decisions on how to use it, or you don't understand the data and don't understand the needs of the analysis, and make incorrect decisions on how to use it.

 

You need to keep region as a category and either create dummy variables for PROC REG, or better yet, go ahead and switch to PROC GLM and put region and gender in a CLASS statement, as stated above.


One more advice: for your future benefit in getting quick and meaningful answers, don't tell us you have a PROC MEANS issue if you are doing a regression. Tell us you are doing a regression. If you had mentioned regression in your very first message, you probably would have gotten the PROC GLM advice in the very first reply.

--
Paige Miller
u58469353
Calcite | Level 5

How would I ensure that the data types are read in correctly when importing the dataset?

u58469353
Calcite | Level 5
/* Generated Code (IMPORT) */
/* Source File: MIS445TelcoExtraCSU-Global.xlsx */
/* Source Path: /home/u58469353/MIS445 */
/* Code generated on: 7/26/21, 1:26 PM */

%web_drop_table(WORK.IMPORT);


FILENAME REFFILE '/home/u58469353/MIS445/MIS445TelcoExtraCSU-Global.xlsx';

PROC IMPORT DATAFILE=REFFILE
	DBMS=XLSX
	OUT=WORK.IMPORT;
	GETNAMES=YES;
RUN;

PROC CONTENTS DATA=WORK.IMPORT; RUN;


%web_open_table(WORK.IMPORT);

This is the code I used to import the dataset. 

ballardw
Super User

Excel is notorious for creating files that Proc Import does not use as intended because there are no constraints on anything.

You may have better luck opening the XLSX file in spread sheet software and then saving it as Comma Separated Values (CSV) and import that file which will be text. Then the generated data step code in the log can be copied to the editor, cleaned up (remove line numbers), and set informats for specific variables to match expected types. If the generated informat is $xx then you have a character variable. If the value should be numeric change the $xx. to something like f8. ;

 

Proc Import expects pretty clean data, simple columns of similar values, no more than one header row - the very first row- to create names. XLSX files from way too many sources have multiple header rows with the column headings actually on rows other than 1, pictures, mixes of values (NA for "not available" or similar in a column that should be numeric - common source of character variables) and values from formula cells can sometimes be entertaining.

 

Sometimes if the structure of the spreadsheet is obnoxious but consistent a data step may be able to parse row values to change how specific rows of the text are handled. Proc Import does not have any way to do so.

Reeza
Super User

@u58469353 wrote:
/* Generated Code (IMPORT) */
/* Source File: MIS445TelcoExtraCSU-Global.xlsx */
/* Source Path: /home/u58469353/MIS445 */
/* Code generated on: 7/26/21, 1:26 PM */

%web_drop_table(WORK.IMPORT);


FILENAME REFFILE '/home/u58469353/MIS445/MIS445TelcoExtraCSU-Global.xlsx';

PROC IMPORT DATAFILE=REFFILE
	DBMS=XLSX
	OUT=WORK.IMPORT;
	GETNAMES=YES;
RUN;

PROC CONTENTS DATA=WORK.IMPORT; RUN;


%web_open_table(WORK.IMPORT);

This is the code I used to import the dataset. 


That isn't the code from the log - that's the code you need. 

 

If you use @Toms code for conversion you're ignoring that fact that Region needs three dummy variables not one. I suppose it's possible your instructor wants you to do that but I highly suspect not. If you did any topics on modeling with categorical predictors review that section. 

u58469353
Calcite | Level 5
1          OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
 68         
 69         /* Generated Code (IMPORT) */
 70         /* Source File: MIS445TelcoExtraCSU-Global.xlsx */
 71         /* Source Path: /home/u58469353/MIS445 */
 72         /* Code generated on: 7/26/21, 1:26 PM */
 73         
 74         %web_drop_table(WORK.IMPORT);
 75         
 76         
 77         FILENAME REFFILE
 77       ! '/home/u58469353/MIS445/MIS445TelcoExtraCSU-Global.xlsx';
 78         
 79         PROC IMPORT DATAFILE=REFFILE
 80         DBMS=XLSX
 81         OUT=WORK.IMPORT;
 82         GETNAMES=YES;
 83         RUN;
 
 NOTE: One or more variables were converted because the data type is not 
       supported by the V9 engine. For more details, run with options 
       MSGLEVEL=I.
 NOTE: The import data set has 1000 observations and 34 variables.
 NOTE: WORK.IMPORT data set was successfully created.
 NOTE: PROCEDURE IMPORT used (Total process time):
       real time           0.30 seconds
       user cpu time       0.30 seconds
       system cpu time     0.00 seconds
       memory              3160.87k
       OS Memory           28580.00k
       Timestamp           07/26/2021 05:26:59 PM
       Step Count                        121  Switch Count  2
       Page Faults                       0
       Page Reclaims                     629
       Page Swaps                        0
       Voluntary Context Switches        18
       Involuntary Context Switches      0
       Block Input Operations            0
       Block Output Operations           776
       
 
 84         
 85         PROC CONTENTS DATA=WORK.IMPORT; RUN;
 
 NOTE: PROCEDURE CONTENTS used (Total process time):
       real time           0.06 seconds
       user cpu time       0.07 seconds
       system cpu time     0.00 seconds
       memory              2009.75k
       OS Memory           27176.00k
       Timestamp           07/26/2021 05:26:59 PM
       Step Count                        122  Switch Count  0
       Page Faults                       0
       Page Reclaims                     141
       Page Swaps                        0
       Voluntary Context Switches        0
       Involuntary Context Switches      0
       Block Input Operations            0
       Block Output Operations           32
       
 
 86         
 87         
 88         %web_open_table(WORK.IMPORT);
 89         
 90         OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
 100        

Here is the code from the log. Sorry about that. My instructor did mention in email that I need to use dummy variables, but we never learned how to do that in this course. 

Tom
Super User Tom
Super User

@u58469353 wrote:

So I need these variables to be in numeric form so that I can perform multiple linear regression using these variables. I'm still encountering the problem when I use this code to change a variable to numeric form.

 

data new;
    set work.import;
    gender_numeric = input(gender,2.);
data new;
	set work.import;
	region_numeric = input(region,4.);
data new;
	set work.import;
	marital_numeric = input(marital,2.);

The issue I run into is that only the last variable I enter shows up in the dataset as a numeric variable. In this case, the marital_numeric variable appears. But the region_numeric and the gender_numeric variables are not showing up. However, if I only do one at a time I am able to use them for graphs and regressions. But I need all 3 of them together to perform a multiple linear regression model. 


It should be pretty obvious from the SAS log what is wrong with that code.  You keep overwriting the dataset NEW with a different version so only the last version is available for further use.  Just put all of the assignment statements in one data step.  Also get in the habit of explicitly ending your steps.  It will reduce your confusion.

data new;
  set work.import;
  gender_numeric = input(gender,2.);
  region_numeric = input(region,4.);
  marital_numeric = input(marital,2.);
run;

 

u58469353
Calcite | Level 5

Thank you so much. I am trying to learn. I appreciate it

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!

Submit your idea!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 23 replies
  • 2331 views
  • 1 like
  • 5 in conversation