BookmarkSubscribeRSS Feed
Nyac122
Calcite | Level 5

i am trying to recode my district variables so i can use proc reg and its not working .

WARNING: Limit set by ERRORS= option reached. Further errors of this type will not be printed

data flgrads;
input districts $ 1-9 year $ 10-20 percentage;
datalines;
alachua 2016-2017 82.7
alachua 2017-2018 88
alachua 2018-2019 88.5
alachua 2019-2020 90.4
alachua 2020-2021 86.6
baker 2016-2017 81
baker 2017-2018 75.5
baker 2018-2019 78.8
baker 2019-2020 84.5
baker 2020-2021 85.7
bay 2016-2017 78.0
bay 2017-2018 81.1
bay 2018-2019 82.5
bay 2019-2020 88.5
bay 2020-2021 90.2
bradford 2016-2017 78.9
bradford 2017-2018 89
bradford 2018-2019 87.7
bradford 2019-2020 98.2
bradford 2020-2021 85
brevard 2016-2017 85.9
brevard 2017-2018 88.1
brevard 2018-2019 88.3
brevard 2019-2020 90.3
brevard 2020-2021 90.6
broward 2016-2017 81
broward 2017-2018 84.3
broward 2018-2019 86.2
broward 2019-2020 89.4
broward 2020-2021 89.1
calhoun 2016-2017 80.9
calhoun 2017-2018 86.9
calhoun 2018-2019 87.9
calhoun 2019-2020 89.9
calhoun 2020-2021 93.1
Charlotte 2016-2017 81.0
Charlotte 2017-2018 87.6
Charlotte 2018-2019 86.4
Charlotte 2019-2020 90.4
Charlotte 2020-2021 90.9
Citrus 2016-2017 78.9
Citrus 2017-2018 84.1
Citrus 2018-2019 86.0
Citrus 2019-2020 87.1
Citrus 2020-2021 88.1
Clay 2016-2017 88.4
Clay 2017-2018 91.1
Clay 2018-2019 91.9
Clay 2019-2020 93.4
Clay 2020-2021 92.7
Collier 2016-2017 88.2
Collier 2017-2018 91.9
Collier 2018-2019 91.9
Collier 2019-2020 92.2
Collier 2020-2021 92.7
Columbia 2016-2017 70.7
Columbia 2017-2018 88.4
Columbia 2018-2019 92.4
Columbia 2019-2020 95.4
Columbia 2020-2021 95.6
MiamiDade 2016-2017 80.7
MiamiDade 2017-2018 85.4
MiamiDade 2018-2019 85.6
MiamiDade 2019-2020 89.6
MiamiDade 2020-2021 90.1
DeSoto 2016-2017 63.8
DeSoto 2017-2018 60.9
DeSoto 2018-2019 71.3
DeSoto 2019-2020 84.6
DeSoto 2020-2021 82
Dixie 2016-2017 89.5
Dixie 2017-2018 96.9
Dixie 2018-2019 90.6
Dixie 2019-2020 89.8
Dixie 2020-2021 84
Duval 2016-2017 80.8
Duval 2017-2018 85.1
Duval 2018-2019 86.5
Duval 2019-2020 90.2
Duval 2020-2021 89.6
;
proc print;
run;
data flgrads2;
set flgrads;
Alachua= (district= 0);
Baker= (districts= 1);
Bay = (districts= 2);
Bradford= (districts= 3);
Brevard= (districts= 4);
Broward= (districts= 5);
Calhoun= (districts= 6);
Charlotte= (districts= 7);
Citrus= (districts= 8);
Clay= (districts= 9);
Collier= (districts= 10);
Columbia= (districts= 11);
MiamiDade= (districts= 12);
DeSoto= (districts= 13);
Dixie= (districts= 14);
Duval= (districts= 15);
run;

7 REPLIES 7
Astounding
PROC Star
Have you examined the log and the output of the program you posted? It creates a character variable DISTRICTS that does not take on numeric values like 1, 2, and 3. Yet your second DATA step seems to expect those values. So it is impossible to determine what data you have, and what result you would like. Perhaps you need statements that look more like:

bay = (districts = "bay");

That statement would at least match the sample data you posted.
ballardw
Super User

You might show the regression you are attempting. Prog reg is not really the best option for categorical variables as Proc Reg assumes that the distance between numeric values is important, which is generally not the case.

 

Perhaps you should be looking at Proc GLM and have your districts variable as a CLASS variable, which means it is treated as categorical. It will also handle the creation of the dummy variables for you, if that is what this code is supposed to do.

 

For future questions, any time you have question or concern about a Warning, Error or Note you should include the LOG with the entire code and all messages related to the code. Copy the text from log, on the forum open a text box using the </> icon that appears above the main message box, then paste the copied text.

 

You code (reduced) of:

data flgrads2;
set flgrads;
Alachua= (district= 0);
Baker= (districts= 1);
Bay = (districts= 2);

Has multiple issues. First District is not a variable in your base data set, Districts is. So the first assignment gets values of 0 because missing is not 0.

The rest you compare a text value such as 'baker'  to 1. Which SAS will attempt to help you by converting the 1 to character since you are using a character variable. And will have values of 0 as a result because "baker" is never equal to '1'.

You likely want to use the equivalent of

Alachua = (district='alachua');

for each value of Districts in your first data step. Note that you add some complication by starting some of the values with upper case and others with lower case. So if you do not provide exactly the same spelling you will still get 0 for results.

 

If you really want the Year to have some meaning, as in a change of a dependent variable from year to year in the model in Proc Req, you should likely use a numeric value of the first or last year of that hyphenated character value.

Personally without that change of year to numeric I don't see anything appropriate to Proc Reg.

PaigeMiller
Diamond | Level 26

Don't create your own dummy variables. Use PROC GLM to do the regression, use the CLASS statement and PROC GLM will properly create the dummy variables behind the scenes, so you don't have to.

--
Paige Miller
Ksharp
Super User

Do you want create a design matrix ?
@Rick_SAS wrote a blog about it before.
https://blogs.sas.com/content/iml/2020/08/31/best-generate-dummy-variables-sas.html

 

 

proc glmselect outdesign(addinputvars)=want noprint;
class districts;
model percentage=districts/selection=none noint ;
run;
PaigeMiller
Diamond | Level 26

Yes, there are cases where the user might want to code their own dummy variables. In general, doing a regression is not such a case, since SAS has already created the mechanism in PROC GLM (and many other modeling PROCs) to create the dummy variables, so the user doesn't have to create their own.

--
Paige Miller
SK_11
Obsidian | Level 7

Create a district code table and then merge or join:
data DistrictCodes;
Input Districts $ DistrictCode ;
datalines;
Alachua 0
Baker 1
Bay 2
Bradford 3
Brevard 4
Broward 5
Calhoun 6
Charlotte 7
Citrus 8
Clay 9
Collier 10
Columbia 11
MiamiDade 12
DeSoto 13
Dixie 14
Duval 15
;
run;

Proc sql;
Create table flgrads as
Select a.*, b.*
from flgrads as a
left join DistrictCodes as b
on upcase(a.Districts)=upcase(b.Districts)
;Quit;

PaigeMiller
Diamond | Level 26

What is your question? How does this relate to your original post to do a regression?

 

Turning character values for District like 'alachua' 'baker' etc. to each have their own unique number doesn't get you a better or different regression fit. The results are exactly the same. All of the manipulations you are talking about are essentially not helpful to a regression. As stated above, you are working very hard to do something that is already programmed by SAS in PROC GLM.

--
Paige Miller

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 719 views
  • 1 like
  • 6 in conversation