About Zerg

Zerg · ‎05-22-2019

Thank you. Your code works.

Zerg · ‎05-22-2019

I have a dataset looks like: id date var1 var2 1 2000 0 1 1 2000 0 1 1 2000 1 1 1 2001 0 0 1 2001 0 0 1 2001 0 0 2 2001 0 0 2 2001 0 0 2 2001 0 0 2 2001 0 0 2 2002 0 1 2 2002 0 1 2 2002 1 1 I want var2 equal to 1 as long as var1 in an id-date group has a value of 1. The result I want is shown in the example data as well. How can I resolve this? Thank you.

Zerg · ‎05-20-2019

Thank you. Following you suggestion, I solved the issue by: data want; set have; if not missing(ret); run;

Zerg · ‎05-20-2019

I have a dataset downloaded from CRSP and looks like: id date returns 1 xx B 1 xx C 1 xx 0.2 2 xx . 2 xx 0.2 2 xx C 3 xx 0.5 3 xx . 3 xx C Returns is a numeric variable but somehow contains letters. Now I want to remove observations with letters or missing value in the returns. I can remove missing value observations with: data want; set have; if not(ret=.); run; But if I try to remove observations with letters: data t2; set t; if not(ret='B'); run; it prompts: Invalid numeric data, 'B' How can I resolve this? Thank you.

Zerg · ‎05-07-2019

Thank you!

Zerg · ‎05-07-2019

Hello, I have a dataset looks like: id date return report date avg. return 1000 1/1/2000 1.1 1/5/2000 1.7 1000 1/2/2000 1.2 1/5/2000 1.7 1000 1/3/2000 1.3 1/5/2000 1.7 1000 1/4/2000 1.4 1/5/2000 1.7 1000 1/5/2000 1.5 1/5/2000 1.7 1000 1/6/2000 1.6 1/5/2000 1.7 1000 1/7/2000 1.7 1/5/2000 1.7 1000 1/8/2000 1.8 1/5/2000 1.7 1001 1/1/2000 1.9 1/3/2000 2.3 1001 1/2/2000 2 1/3/2000 2.3 1001 1/3/2000 2.1 1/3/2000 2.3 1001 1/4/2000 2.2 1/3/2000 2.3 1001 1/5/2000 2.3 1/3/2000 2.3 1001 1/6/2000 2.4 1/3/2000 2.3 1001 1/7/2000 2.5 1/3/2000 2.3 1002 1/1/2000 2.6 1/4/2000 3.05 1002 1/2/2000 2.7 1/4/2000 3.05 1002 1/3/2000 2.8 1/4/2000 3.05 1002 1/4/2000 2.9 1/4/2000 3.05 1002 1/5/2000 3 1/4/2000 3.05 1002 1/6/2000 3.1 1/4/2000 3.05 What I need is to select a number of returns for each id based on the report dates. The returns selected are those within a 3-day window since the report date. After the selection, I want to calculate the average of the selected returns and output results as avg. return. I also want the sql query has certain flexibility that allows me to quickly specify other n-day windows, such as a 20-day window with 10 days prior to the report date and 10 days after. I wrote a preliminary code but it seems not work as I intended: %let begdate = 0; %let enddate = 5; proc sql; create table want as select a.*, mean(b.return) as avgret from have as a left join have as b on a.id=b.id and a.date=b.date where b.date between intnx('WEEKDAY', rptdate, &begdate) and intnx('WEEKDAY', rptdate, &enddate) order by a.permno,a.date; quit; I appreciate your help on how to achieve this. Thank you.

Zerg · ‎04-22-2019

Thank you for the clarification on the difference between d9.4 and 9.4 format. It is helpful. Interestingly, I checked the ParameterEstimates file of proc reg from Sashelp.Tmplstat_en, which governs the format of proc reg output. Here is what I found: proc template; define table Stat.Reg.ParameterEstimates / store = SASHELP.TMPLSTAT_EN; notes "Parameter estimate table"; dynamic dynglue confidence widthMax _hccMethod; column Variable Label DF Estimate StdErr tValue Probt BetaWarning HCStdErr HCCMethod HCTValue HCProbt TypeISS TypeIISS StandardizedEst SemiCorrTypeI CumRSquare FValueI ProbFI SeqFValueI SeqProbFI SqPartCorrTypeI SemiCorrTypeII FValueII ProbFII SqPartCorrTypeII Tolerance VarianceInflation LowerCL UpperCL HCLowerCl HCUpperCL; header h1 clhead typeIhead typeISeqHead typeIIhead HCHead HCclhead; define h1; text "Parameter Estimates"; space = 1; spill_margin; end; define clhead; text confidence BEST8. %nrstr("%% Confidence Limits"); end = UpperCL; start = LowerCL; end; define HCclhead; text "Heteroscedasticity Consistent " confidence BEST8. %nrstr("%% Confidence Limits"); width = 30; end = HCUpperCL; start = HCLowerCL; spill_margin; end; define HCHead; text "Heteroscedasticity Consistent"; expand = "-"; end = HCProbt; start = HCStdErr; end; define typeIhead; text "Type I"; expand = "-"; end = ProbFI; start = FValueI; end; define typeISeqhead; text "Sequential Type I"; expand = "-"; end = SeqProbFI; start = SeqFValueI; end; define typeIIhead; text "Type II"; expand = "-"; end = ProbFII; start = FValueII; end; define Variable; header = "Variable"; style = RowHeader; id; end; define DF; parent = Common.ParameterEstimates.DF; id; end; define Estimate; header = ";Parameter;Estimate"; format = d11.3; parent = Common.ParameterEstimates.Estimate; end; define StdErr; header = ";Standard;Error"; format = d11.3; parent = Common.ParameterEstimates.StdErr; end; define tValue; parent = Stat.Reg.tValue; end; define Probt; glue = dynglue; parent = Stat.REG.Probt; end; define BetaWarning; translate _val_=0 into "", _val_=1 into "*"; format = 1.0; pre_merge; end; define HCStdErr; header = ";Standard;Error"; format = d11.3; parent = Common.ParameterEstimates.StdErr; end; define HCCMethod; header = "HCCMethod"; print = OFF; end; define HCTValue; parent = Stat.Reg.tValue; end; define HCProbt; glue = dynglue; parent = Stat.REG.Probt; end; define TypeISS; header = ";Type I SS"; format = d11.3; end; define TypeIISS; header = ";Type II SS"; format = d11.3; end; define StandardizedEst; header = ";Standardized;Estimate"; format = d11.3; parent = Common.ParameterEstimates.StandardizedEst; end; define SemiCorrTypeI; header = ";Squared;Semi-partial;Corr Type I"; format = d11.3; end; define CumRSquare; header = ";Cumulative;R-Square"; format = d11.3; end; define FValueI; glue = 10; parent = Stat.REG.FValue; end; define ProbFI; parent = Stat.REG.ProbF; end; define SeqFValueI; glue = 10; parent = Stat.REG.FValue; end; define SeqProbFI; parent = Stat.REG.ProbF; end; define SqPartCorrTypeI; header = ";Squared;Partial;Corr Type I"; format = d11.3; end; define SemiCorrTypeII; header = ";Squared;Semi-partial;Corr Type II"; format = d11.3; end; define FValueII; glue = 10; parent = Stat.REG.FValue; end; define ProbFII; parent = Stat.REG.ProbF; end; define SqPartCorrTypeII; header = ";Squared;Partial;Corr Type II"; format = d11.3; end; define Tolerance; header = "Tolerance"; format = d11.3; end; define VarianceInflation; header = ";Variance;Inflation"; format = d11.3; end; define LowerCL; format = d11.3; glue = 10; print_headers = OFF; end; define UpperCL; format = d11.3; print_headers = OFF; end; define HCLowerCL; format = d11.3; glue = 10; print_headers = OFF; end; define HCUpperCL; format = d11.3; print_headers = OFF; end; define Label; width = widthMax; parent = Common.ParameterEstimates.Label; maximize; end; required_space = 5; use_name; end; run; As you can see, this section define Estimate; header = ";Parameter;Estimate"; format = d11.3; parent = Common.ParameterEstimates.Estimate; end; does specify that the parent of Estimate is Common.ParameterEstimates.Estimate. The more interesting thing is that the format of estimate is set to d11.3, but the output I get from proc reg gives me parameter estimates rounded to the nearest 5th decimal point. I guess locating the file Common.ParameterEstimates.Estimate may help resolve the issue. By the way, I am using SAS 9.4.

Zerg · ‎04-20-2019

Hello, I want to set the default rounding for parameter estimates and standard errors to the nearest 4th digit (in the html output) for all SAS regression procedures (such as proc reg, proc glm, proc surveyreg). I figured out how to do this on p-values (set to 2th digit): proc template; define column Common.PValue; notes "Default p-value column"; just = r; format = pvalue9.2; end; run; But I can't do the same thing on parameter estimates and standard errors with the following codes: proc template; define column Common.ParameterEstimates.Estimate; notes "Default estimates column"; just = r; format = d9.4; end; run; Is there a way to achieve what I want? I appreciate any help. Thank you.

Zerg · ‎04-18-2019

Works perfectly. Thank you.

Zerg · ‎04-18-2019

Hello, I want to delete all the duplicated observations. Example data is: obs. id year var1 1 1 1999 5 2 2 2000 10 3 2 2000 8 4 2 2000 6 5 3 2001 7 6 4 2002 12 7 4 2002 15 8 5 2001 9 9 6 2001 4 10 7 2002 3 A unique observation is determined by id and year, so in the example data, the duplicates are obs. 2 3 4 and obs. 6 7. I want to delete all the duplicates without keeping one unique observation. So the data I want is: obs. id year var1 1 1 1999 5 5 3 2001 7 8 5 2001 9 9 6 2001 4 10 7 2002 3 proc sort by id year nodupkey option will keep obs. 2 and obs. 6 and nouniquekey option can output a dataset that contains obs.2 3 4 6 7. Is there an efficient way to get the final data I want? Thank you.

Zerg · ‎03-19-2019

Thank you for the code. I will test it to see if it meets my needs.

Zerg · ‎03-19-2019

Thank you for your suggestion. This looks like a handy procedure that gives me what I am looking for.

Zerg · ‎03-18-2019

Hello all, I appreciate your help on writing an iteration process. I need to split a sample into three groups based on the following procedure: 1. Randomly select 3 seed observations from main dataset, and assign them to three sub datasets, "high", "middle", and "low" respectively based on the values of variable A from the seed observations. So each sub dataset now has one observation to start with. 2. Starting from the main dataset with the 3 seed observations excluded, get the difference between the value of variable A from each observation and the median value of variable A in the sub datasets. An observation will be added to one of the sub datasets when the sub dataset has the smallest value on the squared difference compared with other sub datasets. 3. Repeat step 2 until all the observations in the main dataset have been examined. I have figured out the first step and have the following sample data to start with: data have; input a; cards; -1.35 -1.10 -1.02 -0.72 -0.18 -0.11 0.31 0.58 0.67 ; run; *randomly generate 3 seed observations*; proc surveyselect data=have out=rand method=srs sampsize=3 seed=100 noprint; run; data rand; set rand; n+1; run; data t1; set rand; if n=1; run; data t1; set t1; drop n; run;*low sub dataset*; data t2; set rand; if n=2; run; data t2; set t2; drop n; run;*middle sub dataset*; data t3; set rand; if n=3; run; data t3; set t3; drop n; run;*high sub dataset*; *exclude the 3 seed observations from main dataset*; proc sql; create table data as select a.*,b.x from w1 a left join rand b on a.x=b.x where b.x is null; quit; After running the code above, I have 3 sub datasets "t1" "t2" and "t3", and a main dataset "data". How can I code steps 2 and 3 with these datasets? I am open to coding step 1 in a more efficient manner as well. Many thank!

Zerg · ‎04-07-2018

Thank you . It works perfectly. 🙂

Zerg · ‎04-07-2018

Hello all, I currently have a data file looks like: ID Year Var A 1 2000 0 1 2001 0 1 2002 1 1 2003 0 2 2000 0 2 2001 1 2 2002 1 2 2003 1 2 2004 0 3 2001 0 3 2002 0 3 2003 1 3 2004 0 3 2005 1 3 2006 0 What I need is excluding observations with Var A = 0 prior to the first time when Var A = 1 by the group ID. So the outcome of the data needs to look like: ID Year Var A 1 2002 1 1 2003 0 2 2001 1 2 2002 1 2 2003 1 2 2004 0 3 2003 1 3 2004 0 3 2005 1 3 2006 0 How can I code it? Thank you.

Online Status	Offline
Date Last Visited	‎03-31-2021 12:30 AM

Re: A quick question

A quick question

Re: Remove numeric variable with a character value

Remove numeric variable with a character value

Re: Need help on an SQL query

Need help on an SQL query

Re: Setting default rounding with proc template

Setting default rounding with proc template

Re: Delete all the duplicates

Delete all the duplicates