About mh2t

Reeza · ‎07-15-2020

Did you check the OUTPUT statement in PROC GLM? There's examples towards the bottom and then run a PROC MEANS on your output data set. https://documentation.sas.com/?docsetId=statug&docsetTarget=statug_glm_syntax17.htm&docsetVersion=15.1&locale=en

BrianGaines · ‎07-09-2020

Hi @mh2t, Are you using SAS Studio to develop your code? If so, then I suggest that you take a look at the tasks (specifically, the Partitioning, Gradient Boosting, and Assess tasks) because they can expedite your code development. As @StatDave mentioned, a convenient way to organize your data is to have one data table with an indicator variable that denotes which partition an observation belongs to. One benefit to this approach is that, when you estimate your model and use the PARTITION statement, some performance metrics for the validation and test partitions are automatically calculated so you don't need to calculate them separately as an additional step. For example, the following code creates a CAS session, loads SASHELP.CARS as an in-memory table, and partitions that table into three sets (the PROC PARTITION code is from the Partitioning task): /* Connect to CAS */ cas; libname mylib cas caslib="casuser"; /* Load data into memory */ data mylib.cars; set sashelp.cars; run; /* Partition data set */ proc partition data=mylib.cars partind samppct=30 samppct2=10; output out=mylib.cars; run; Now the data table MYLIB.CARS has a new _PartInd_ column where 0 corresponds to the training set, 1 for validation, and 2 for test. You can then use this data table with the PARTITION statement in PROC GRADBOOST, as is done with the following code (generated by the Gradient Boosting task): proc gradboost data=MYLIB.CARS outmodel=mylib.savedModel; partition role=_PartInd_ (validate='1' test='2' train='0'); target Origin / level=nominal; input MSRP EngineSize / level=interval; input DriveTrain / level=nominal; ods output FitStatistics=work.Gradboost_fit; score out=mylib.scored copyvars=(Origin MSRP EngineSize DriveTrain _PartInd_); run; You can see in the results that the procedure automatically calculates fit statistics for all three partitions: You could also use the saved model (mylib.savedModel) and PROC GRADBOOST to score the validation set, like in the following code: proc gradboost data=MYLIB.CARS(where=(_partind_=1)) inmodel=mylib.savedModel; output out=mylib.valscored copyvars=(_all_); run; And you can see that the fit statistics match those produced by PROC GRADBOOST for the validation set when you estimated the model (compare with the previous results): But again, by organizing your data partitions into the same table and by using the PARTITION statement, SAS automatically calculates these fit statistics when you estimate your model. You can also use the scored data table (mylib.scored) with the Assess task for additional model assessment. Does this help? -Brian

Ksharp · ‎06-29-2020

As Reeza said, proc format also is a proper tool. data first; input var1 $ var2 $ var1000 $ Y; cards; a b c 111 c n f 111 d x g 222 ; run; data second; input varName $ Levels $ new_level $; cards; var1 a 2nd var1 c 1st var1 d 3rd var2 b 1st var2 n 3rd var2 o 2nd var1000 g 5th ; run; data fmt; set second; type='C'; varName=cats(varName,'_'); rename varName=fmtname Levels=start new_level=label; run; proc format cntlin=fmt; run; data want; set first; array x{*} $ var1--var1000; array _x{3} $40 _var1 _var2 _var1000; do i=1 to dim(x); _x{i}=putc(x{i},cats(vname(x{i}),'_')); end; drop i; run;

Ksharp · ‎06-27-2020

proc rank data=have groups=4 out=want; by varName; var value; ranks new_level; run;

Ksharp · ‎06-26-2020

data have; input id (var1 var2) ($) y; datalines; 1 a b 1111 2 c m 2222 3 a m 4444 4 d m 7777 ; proc transpose data=have(obs=0) out=temp; var _all_; run; proc delete data=want;run; data _null_; set temp(where=(lowcase(_name_) like 'var%')); call execute(cat('proc sql;create table x as select "',_name_,'" as vname length=40,', _name_,' as levels,count(',_name_,') as freq,mean(y) as avg_y from have group by ',_name_, ';quit;proc append base=want data=x force;run;')); run;

Rick_SAS · ‎06-24-2020

You can get the RSquare values for the models by putting ods output FitStatistics(PERSIST)=Results; /* concatenate into data set named 'Results' */ prior to the first PROC GLM call. The mean response does not change from model to model, so that will be a constant column.

Reeza · ‎06-24-2020

Here's a great tutorial on merging. https://stats.idre.ucla.edu/sas/modules/match-merging-data-files-in-sas/

SteveDenham · ‎06-23-2020

Well, it looks like the data is correctly shaped (at least to me). The error in the PROC GLM call arises from including outest=PE at this point. This looks like a relic from PROC REG code. In GLM, so far as I can tell, you have to use an ODS OUTPUT statement. To get parameter estimates and Rsquared, you would need this ODS OUTPUT ParameterEstimates=ParameterEstimates FitStatistics=FitStatistics. To get the estimates and Rsquared into the same dataset, you will have to do some shaping, and then a many-to-one merge or sql join, as you will have a lot of lines of parameter estimates for each by variable, but only one line in FitStatistics with the Rsquared value. SteveDenham

Reeza · ‎06-19-2020

Or import your Excel data (via PROC IMPORT) and it becomes a SAS data set you can query.

Online Status	Offline
Date Last Visited	‎06-08-2021 03:18 AM

predicted and observed values in one dataset

How to use validate and test datasets manually in PROC GRADBOOST?

Replace old levels by new levels and keep the variables' names and typ...

Re: Grouping levels per variable

Grouping levels per variable

Re: levels average and frequency for 1000 variables by proc sql

levels average and frequency for 1000 variables by proc sql

Re: Joining and reshaping tables in PROC GLM

Re: Joining and reshaping tables in PROC GLM

Re: Joining and reshaping tables in PROC GLM

Re: predicted and observed values in one dataset

Re: levels average and frequency for 1000 variables by proc sql

Re: levels average and frequency for 1000 variables by proc sql

Re: Include target variable in PROC GLM to run thousands of univariate...

Re: Include target variable in PROC GLM to run thousands of univariate...

Re: predicted and observed values in one dataset

Re: How to use validate and test datasets manually in PROC GRADBOOST?

Re: Replace old levels by new levels and keep the variables' names and...

Re: Grouping levels per variable

Re: levels average and frequency for 1000 variables by proc sql

Re: R - Square in GLM

Re: Joining and reshaping tables in PROC GLM

Re: Include target variable in PROC GLM to run thousands of univariate...

Re: List variables by condition from Excel sheet to run thousands of u...