About Pritish

Pritish · ‎04-14-2020

Hi - Can some one help me understand what is the default Lambda value in Selection=Lasso for proc GLMSelect? I came across a forum discussion in which Rick suggested a user to use Selection=GroupLasso, if the user would like to set the lambda (L1) parameter. Also, if I use GroupLasso without L1 value that will be default value for L1 hyper-parameter? selection=GROUPLASSO(adaptive choose=sbc stop=L1 SHOWSTEPL1); https://communities.sas.com/t5/Statistical-Procedures/getting-the-selection-details-lambda-fit-statistic-in-proc/td-p/447197

Pritish · ‎04-06-2020

I have not tried reducing the font size. I will give it a try and see if it works.

Pritish · ‎04-06-2020

Below is the code which I am using: /* x axis column date_mmddyyyy has the format mmddyy10. : for ex: 03/31/2009 */ proc sgplot data=graph; XAXIS Label="YYYYQ" valuesformat=yyq6. interval=quarter valuesrotate=vertical fitpolicy=rotate; band x=date_mmddyyyy lower=0 upper= _freq_ / y2axis name="v3"; series x=date_mmddyyyy y=mean; series x=date_mmddyyyy y=median; run; Below is the note, I get when I execute the program. NOTE: TICKVALUEFITPOLICY=Rotate is ignored when SPLITTICKVALUE=TRUE. The default THIN policy is used. NOTE: Some of the tick values have been thinned. NOTE: TICKVALUEFITPOLICY=Rotate is ignored when SPLITTICKVALUE=TRUE. The default THIN policy is used. NOTE: Some of the tick values have been thinned.

Pritish · ‎04-02-2020

I tried Interval = quarter option, however it does not show all the possible values. It skips every 2/3 quarters.

Pritish · ‎04-02-2020

Hi All - I am trying to plot year-quarter values on the x axis using sgplot and have around 50 labels. When I use SGPlot, it only shows every alternative year quarter instead of all the 50 labels. Is there any way, I can show all possible value on the x-axis? I did explore 'values' options, but it might not a good option as the year-quarter information will vary based on my data. Any thoughts / guidance will be appreciate.

Pritish · ‎04-01-2020

Thank you very much!

Pritish · ‎04-01-2020

Hi - I am trying to generate graph for categorical variables which has value of (missing, 0 and 1 for numeric value). Currently, I have a code which only displays the non-missing values (0,1) and I am wondering if sgplot need any instruction to display the missing bucket. FYI: My list of variable contains both numeric and character so depending on type, I will have '.' for numeric and '' for character. Below is the code: DATA cars1; infile DATALINES dsd missover; INPUT var val; CARDS; ,0.2099 0,0.4749 1,0.7827 ; RUN; proc sgplot data=cars1; vbar var / response=val group=var datalabel datalabelattrs=(weight=bold); yaxis grid label='Actual Default Rate' ; run; DATA cars2; infile DATALINES dsd missover; INPUT var $ val; CARDS; ,0.2099 A,0.4749 B,0.7827 ; proc sgplot data=cars2; vbar var / response=val group=var datalabel datalabelattrs=(weight=bold); yaxis grid label='Actual Default Rate' ; run; Can you please advice on how to include missing values on the sgplot?

Pritish · ‎03-30-2020

Thank you for the suggestion on SGPlot. I have changed my code and am currently in the process of testing it. Will share an update once I have the output. One related question I have regarding exporting output of sgplot results to excel - Is there a way, we can specify where the output of the sgplot should be placed on the excel file? For example: If I am generating three graph - I would want graph one and two side by side and the third graph below graph 1.

Pritish · ‎03-29-2020

Hi I have a dataset with millions of rows and around 500 variables. For each of these 500 variables, I am trying to generate a plot using gplot and save the output to a PDF file. However, the problem is my code ran for more than 18 hours and still it had only processed around 150 variables. I believe it is taking long primarily because I am saving my output to a PDF file (reviewed below forum discussion which probably supports my hypothesis). I am wondering if there is a better alternative that I can leverage to a. Speeds up the process b. Output graphs for all 500 variables either to excel/XML/pdf/RTF. https://communities.sas.com/t5/SAS-Enterprise-Guide/PROC-GPLOT-takes-a-LONG-time-to-create-a-PDF/td-p/65299

Pritish · ‎03-21-2016

Sorry for the poor description. Here's what I am trying to achieve: Basically, I have a table which contains unique 5 digit number and total record counts associated with it. Now the 5 digit numbers are not in sequence, however they could have either the first four digits or first three or first two digits in common. (01521, 01529). Now using this numbers, I want to roll up the date based on each number. So let's say if number 01521 have more than 10 records, however 01529 doesn't have 10 records, I would like to create a new column which will tell me which numbers where grouped together (for ex: 01521-01529). Here's are some sample records: Column A # of Records 01111 111 01119 6 Now since the first four digits are common for above numbers, I would like to create a new column with value for each row as '01111-01119'. Similarly, here is the second scenario where I have exhaused the first four combinations. So I would go for matching the first three digits : Column A # of Records 01255 99 01299 6 Again I only need to do a grouping if the number of records are less than 10. Here is the code that I have which definitely not the best piece of code: data want; merge test1 test1 (firstobs=2 rename=(columna=_columna1 count=_count1 flag=_flag1 )) test1 (firstobs=3 rename=(columna=_columna2 count=_count2 flag=_flag2 )) ; if count (# of records) < 10 then do; if substr(columna,1,4) = substr(_columna1,1,4) then do; if count + _count1 >= 10 then derived_column = strip(column) || '-' || strip(_columna1); else if substr(column,1,4) = substr(_columna2,1,4) then do; if count + _count1 + _count2 >= 10 then do; derived_column = strip(column) || '-' || strip(_columna2); end; end; The problem with above program is I am not able to derive the new column for records that have value of > 10. HTH. Thanks!

Pritish · ‎03-21-2016

I am trying to derive a new column (Column B) based on an existing column (Column A) on my data. ColumnA CountofA 12345 15 12346 20 12347 6 12348 20 12349 4 12350 20 21100 6 21111 6 21112 4 21299 2 21399 2 Desired Output ColumnA ColumnB 12345 12345 12346 12346-12347 (This will get merged since the first four letter are the same) 12347 12346-12347 (This will get merged since the first four letter are the same) 12348 12348-12349 (This will get merged since the first four letter are the same) 12349 12348-12349 (This will get merged since the first four letter are the same) 12350 12350 21101 21101-21111 (This will get merged since the first 3 digits are the same and their sum total is 12) 21111 21101-21111 (This will get merged since the first 3 digits are the same and their sum total is 12) 21112 21112-21399 (This will get merged since the first 2 digits are the same and their sum tota is 😎 21299 21112-21399 (This will get merged since the first 2 digits are the same and their sum tota is 😎 21399 21112-21399 (This will get merged since the first 2 digits are the same and their sum tota is 😎 The code that I currently involves lot of manual check and so I thought of posting my question over here to get guidance on solving above scenarios. Thanks in advance

Pritish · ‎10-14-2015

Hi All, I wanted to get some idea on how I can force eminer decision tree to iteratively try variables (300+) in a dataset and recommend / find the most optimal split based on user specified criteria.

Pritish · ‎11-18-2014

Hi All, I am kind of new to Proc Transreg procedure. I am wondering if is there a way/option, which will allow me have a monotonic transformation of data i.e the data value needs to keep on increasing or decreasing. Let me know if you need any additional information. Thanks in advance.

Pritish · ‎09-15-2014

Thanks! How do you treat those outlier? I don't want them to exclude from my dataset.

Pritish · ‎09-13-2014

Hi, I am trying to understand and find out what are the different techniques commonly used to Caps and floor outliers present in a dataset? Any guidance/research paper link is greatly appreciated?

Online Status	Offline
Date Last Visited	‎04-09-2021 09:47 PM

GLMSelect - Selection=Lasso | Selection=GroupLasso

Re: Displaying all Labels for X-Axis

Re: Displaying all Labels for X-Axis

Re: Displaying all Labels for X-Axis

Displaying all Labels for X-Axis

Re: SGPlot - Display Missing Category

SGPlot - Display Missing Category

Re: Plotting Data and output as PDF

Plotting Data and output as PDF

Re: Rollup data - conditional grouping

Re: Finding top variables (attributes)

Re: how to convert char var to sas date?

Re: Help with grouping

Re: que regarding proc standard

Re: what is the difference?

How to merge datasets and print a summary?

Re: PROC codes for tables

GLMSelect - Selection=Lasso | Selection=GroupLasso

Re: Displaying all Labels for X-Axis

Re: Displaying all Labels for X-Axis

Re: Displaying all Labels for X-Axis

Displaying all Labels for X-Axis

Re: SGPlot - Display Missing Category

SGPlot - Display Missing Category

Re: Plotting Data and output as PDF

Plotting Data and output as PDF

Re: Rollup data - conditional grouping

Rollup data - conditional grouping

eminer decision tree

Proc Transreg

Re: Capping and Flooring outliers - Methods

Capping and Flooring outliers - Methods