I am very new to lasso although I have read a lot of articles on it recently. I am trying to determine if I am coding lasso correctly or not primarily, but also interpreting the results SAS generates. A little information first about why I am using lasso. The federal government chose 49 control variables for a major project I was assigned to. Initially I was using all of them to be consistent with their usage. I am testing the impact of service delivery on income controlling for these variables with linear regression. To do this I first look at who was eligible for a service and then determine if they got at least one service. So 1 you did, 0 you did not. These are the predictors of income I really care about. The problem is for some services the total population of those eligible is quite small as few as 43 cases. So I end up with more predictors than cases. That is why I am using lasso (well also I think 49 predictors few of which have known theoretical links to income is absurd). The lasso code I run is attached. The concerns I have is that first in the results it says:
Selection stopped because all candidate effects for entry are linearly dependent on effects in the model.
This occurs every time I run lasso. I am not sure if is an issue or not. Nothing I have found in the documentation addresses it. Second, in the printout it has
The selected model, based on SBC, is the model at Step 8.
and a list of variables. Is this the lasso variables selected? I have a list of education predictors in the overall model (there are about 7 dummies, it has 8 levels). The lasso chose say 5 of the 7 levels, five of the dummies. I assume not to violate the way dummies are normally used (everyone is in some level)I should not omit the levels the lasso leaves out, but as I said I am new to this. I am using a larger population that contains subpopulations that I end up analyzing to choose the lasso model. Is it valid to do that to choose the variables that are used with the smaller subpopulations that way? Those subpopulations are so tiny I doubt lasso will run on them if I use only then. And since the large populations is related, they are all special service customers there is a link to me between the large population I used for the lasso and the subpopulations.
proc sql;
Create table work.SEtest as
Select * from dora.incomerev
where plantype ='4';
quit;
proc glmselect data= work.setest;
CLASS
"Age 25 to 44"n (ref ="0")
"Associate’s degree"n (ref ="0")
"Bachelor’s degree"n (ref ="0")
"Beyond a bachelor’s degree"n (ref ="0")
"High school diploma or equivalen"n (ref ="0")
/*"Individuals has a significant di"n (ref ="0")removed for SE analysis */
"Postsecondary education no degre"n (ref ="0")
"Race: Black"n (ref ="0")
"Race: More than one"n (ref ="0")
"Special education certicate/comp"n (ref ="0")
"Age 19 to 24"n (ref ="0")
"Age 45 to 54"n (ref ="0")
"Age 55 to 59"n (ref ="0")
"Age 60+"n (ref ="0")
'Age 16 to 18'n (ref ="0")
"Race: Asian"n (ref ="0")
"Race: Hawaiian/Pacific Islander"n (ref ="0")
"Race: White"n (ref ="0")
"Foster care youth"n (ref ="0")
"Psychosocial and psychological d"n (ref ="0")
"Intellectual and learning disabi"n (ref ="0")
"Physical disability"n (ref ="0")
"Auditory and communicative disab"n (ref ="0")
Veteran (ref ="0")
"TANF recipient"n (ref ="0")
"Single parent"n (ref ="0")
/*"Received career services"n (ref ="0") */
/*"Received training services"n (ref ="0")*/
/*"Received other services"n (ref ="0")*/
"Received public support at appli"n (ref ="0")
"Employed at application"n (ref ="0")
"Homeless individual, runaway you"n (ref ="0")
"Low-income"n (ref ="0")
"Limited English-language profici"n (ref ="0")
"Migrant and seasonal farmworker"n (ref ="0")
"Long-term unemployed"n (ref ="0")
/* "Individuals is most significant"n (ref ="0")removed for SE analysis */
"Ethnicity-Hispanic Ethnicity"n (ref ="0")
"Ex-offender"n (ref ="0")
"Displaced homemaker"n (ref ="0")
Female (ref ="0")
;
MODEL Qtr2_Wage=
"Age 25 to 44"n
"Associate’s degree"n
"Bachelor’s degree"n
"Beyond a bachelor’s degree"n
"High school diploma or equivalen"n
/*"Individuals has a significant di"n */
"Postsecondary education no degre"n
"Race: Black"n
"Race: More than one"n
"Special education certicate/comp"n
"Age 19 to 24"n
"Age 45 to 54"n
"Age 55 to 59"n
"Age 60+"n
'Age 16 to 18'n
"Race: Asian"n
"Race: Hawaiian/Pacific Islander"n
"Race: White"n
"Foster care youth"n
"Psychosocial and psychological d"n
"Intellectual and learning disabi"n
"Physical disability"n
"Auditory and communicative disab"n
Veteran
"TANF recipient"n
"Single parent"n
/*"Received career services"n
"Received training services"n
"Received other services"n */
"Received public support at appli"n
"Employed at application"n
"Homeless individual, runaway you"n
"Low-income"n
"Limited English-language profici"n
"Migrant and seasonal farmworker"n
"Long-term unemployed"n
/*"Individuals is most significant"n */
"Ethnicity-Hispanic Ethnicity"n
"Ex-offender"n
"Displaced homemaker"n
Female
"Construction Employment"n
"Educational, or Health Care Rela"n
"Financial Services Employment"n
"Information Services Employment"n
"Leisure, Hospitality, or Enterta"n
"Natural Resources Employment"n
"Other Services Employment"n
"Trade and Transportation Employm"n
"Professional and Business Servic"n
"Manufacturing Related Employment"n
"totalgovernment"n
/ selection=lasso(choose=sbc stop=none);
run;
... View more