About Yegen

Yegen · ‎02-25-2018

Similar to the suggestion by @Cynthia_sas (as far as I know, in SAS Studio WRDS one cannot directly export to the local machine, unless one is connected to the WRDS server with his or her local machine using SAS rather than SAS Studio and uses a program like WinSCP), you can do the following trick: libname out "/home/.../"; data out.filename; set filename; run; Or you can do the following: data '/home/.../filename.sas7bdat'; set filename; run; Both should work. This exporting keeps your data in the .sas7dbat format. However, you can also use the following command if you want to export your data in .csv format: proc export data=work.filename DBMS=csv outfile="/home/.../filename.csv" replace; run; I hope this helps @fafrin420.

Yegen · ‎02-14-2018

Thanks for your helpful comments, @Rick_SAS and @SAS_Rob. The lower bound for the number of distinct levels of ID is 15,000 (and the upper bound is around 170,000). I will give the suggestion you have made a try. I also had a conversation with my co-author and we thought of the following. Since fixed effects just demean the LHS and RHS variables, one can just compute the means of the given variables at the distinct ID level. Since I have two different FEs (i.e., ID and year), I computed the mean of the same variables at the year level. Following that, I just subtracted both means (i.e., corresponding ID and year means) from the corresponding variables (e.g., Y_t,i - Y_mean_i - Y_mean_t) and obtained the demeaned variables. Then, I just used PROC SURVEYREG with clustering at the id-level and voila I got the results pretty quickly. PROC SURVEYREG does not seem to like large number of fixed effects, but handles well clustering (whereas, PROC GLM handles fixed effects well, but does not have a clustering option). Thanks again, @Rick_SAS and @SAS_Rob.

Yegen · ‎02-13-2018

Perfect, thanks for this helpful comment @ballardw.

Yegen · ‎02-12-2018

I have a large dataset with over 15,000 Fixed Effects. When I run the code below, SAS fails to produce any output indicating that there is not enough memory (I am skipping the other ods output parts to shorten my code). proc surveyreg data=have; cluster id; class year id; model dependent_var = independent_var year id; ods output ParameterEstimates = OutputStats_1 (where=(Parameter in ('Intercept','independent_var '))); quit; If I would not be required to cluster the standard errors at the id level, I could have simply used proc glm and absorbed the fixed effects variables. I was able to get results using the proc glm approach, but not with the proc surveryreg. I assume the proc surveyreg is a very slow approach? Is there any way to absorb the fixed effects as in proc glm? I am trying to run a panel data regression with year and id fixed effects and standard errors clustered at the id level.

Yegen · ‎01-23-2018

I see that makes sense, thanks for these helpful comments. Thanks for pointing that out regarding the SRS method. I was not sure that the selection of records for each new sample is drawn from the entire pool of records. That's exactly what I need to do. The code is running quite well except for one issue. I am getting the following error for some managers (i.e., IDs): 527 528 529 proc surveyselect data=work.sample_have 530 sampsize=work.stratsize 531 method=srs 532 reps=100 533 out=work.sampled 534 ; 535 stratum id; 536 run; ERROR: The sample size, 8, is greater than the number of sampling units, 3. NOTE: The above message was for the following stratum: id=7. ERROR: The sample size, 7, is greater than the number of sampling units, 3. NOTE: The above message was for the following stratum: id=8. ERROR: The sample size, 5, is greater than the number of sampling units, 3. NOTE: The above message was for the following stratum: id=9. I assume that the stratsize is greater than the actual number of records for these managers, right? It looks like that these managers are never included in the sample. Would it make sense to replace the stratsize for these managers with the actual number of records (see as below)? As far as I understood it, stratsize is the number of records that need to be randomly drawn from the initial pool that contains all records. Is that fair to say that? data stratsize(drop=sample_size); set stratsize; if sample_size<_nsize_ then _nsize_=sample_size; else _nsize_=_nsize_; run;

Yegen · ‎01-22-2018

Thanks for willing to help out, @ballardw. Exactly, each randomly constructed sample will have 2 records for manager 63. The idea of this program is actually very easy. The variable "n_number" is exogenously given. I need to construct "n_number" of records for each manager. For example, in the work.have dataset, manager 63 has n_number=2, indicating that 2 records need to be drawn from the work.have dataset for each randomly constructed sample. The two records will always be rows that have manager=63. I will iterate this exercise over all managers (i.e., draw "n_number" of raws that have the same manager number). I have tried to run the code you have posted, but the following error occurs: 78 79 proc surveyselect data=work.have 80 sampsize=work.stratsize 81 reps=10 82 out=work.sampled 83 ; 84 stratum manager; 85 run; ERROR: No _NSIZE_ variable is found in the SAMPSIZE= input data set. NOTE: The SAS System stopped processing this step because of errors. WARNING: The data set WORK.SAMPLED may be incomplete. When this step was stopped there were 0 observations and 0 variables. WARNING: Data set WORK.SAMPLED was not replaced because this step was stopped. I am not sure why you are calculating the maximum n_number for each manager. I think that one could do the same as follows: proc sql; create table work.stratsize as select distinct manager, n_number as _nsize_ from work.have; quit; proc surveyselect data=work.have sampsize=work.stratsize reps=10 out=work.sampled ; stratum manager; run; To answer one of the other questions related to the sample size, I do have a large number of observation for each manager (thousands of rows, in total there are about 20 million unique manager-firm pairs). Also, to answer the duplicates related question, each manager-firm pair is unique. So there are no duplicates. I have a followup question though. I understand that the above code selects randomly "_nsize_" (or "n_number") of rows for each manager and construct 10 samples by doing so. However, I am not sure how to change the restrict the selection to with replacement of rows (SRS is a without replacement procedure as far as I recall). Is there a way to change these conditions in proc surveryselect?

Yegen · ‎01-22-2018

I have the following dataset: data work.have; infile cards expandtabs truncover; input manager manager_firm_pair $24. n_number; cards; 63 63|1323A|45C6C 2 63 63|1323A|789A 2 63 63|1323A|12SAS 2 63 63|1323A|2DS3D 2 63 63|789A|12SAS 2 63 63|789A|2DS3D 2 63 63|2DS3D|12SAS 2 89 89|R34S2|SDS23 3 89 89|R34S2|J234S 3 89 89|R34S2|S2WAD3 3 89 89|J234S|S2WAD3 3 ; run; The variable "n_number" represents the number of random rows the code needs to select. For example, for Manager 63, the code needs to randomly select 2 non-identical rows (or manager_firm_pair) that have the Manager number 63. I want to repeat this selection X times (in this example 10 times, but with my real sample 10,000 times). In other words, I want to construct X samples each time randomly selecting "n_number" of rows (or manager_firm_pair) for a each Manager. Here are a few sample outputs that I want based on the above data, work.have: data work.want_1; infile cards expandtabs truncover; input manager_firm_pair $24. n_number; cards; 63 63|1323A|789A 2 63 63|1323A|2DS3D 2 89 89|R34S2|J234S 3 89 89|R34S2|S2WAD3 3 89 89|J234S|S2WAD3 3 ; run; data work.want_2; infile cards expandtabs truncover; input manager_firm_pair $24. n_number; cards; 63 63|1323A|12SAS 2 63 63|789A|12SAS 2 89 89|R34S2|J234S 3 89 89|R34S2|S2WAD3 3 89 89|J234S|S2WAD3 3 ; run; I thought of using the following code, but I cannot specify how to iterate this random selection X times and select "n_number" of rows for each manager: PROC SURVEYSELECT DATA=work.have OUT=work.try METHOD=SRS SAMPSIZE=10 SEED=1234567; RUN;

Yegen · ‎01-09-2018

Thank you for your replies, @snoopy369 and @Astounding. I really liked the array solution that @snoopy369 proposed. I appreciate your suggestions.

Yegen · ‎01-09-2018

I am trying to reduce duplicates in my dataset. Although the identifier is effectively the same, it is in based revered client numbers. In other words, if we would have replaced the client numbers in the above example for row 2 (in the have dataset), then the uniqueness identifier would be the same. My code works well when the client numbers are numerically, but unfortunately, my actual client numbers are strings. Here is an example where my code works well: data work.have_example; infile cards expandtabs truncover; input manager client1 client2; cards; 6363 123 456 6363 456 123 ; run; data work.want_example; infile cards expandtabs truncover; input manager client1 client2; cards; 6363 123 456 ; run; *Here is what I have tried to do; data temp1; set have; uniqueness_identifier1=CATS(manager,client1,client2); uniqueness_identifier2=CATS(manager,client2,client1); run; data temp1; set temp1; if uniqueness_identifier1>uniqueness_identifier2 then uniqueness_identifier3=uniqueness_identifier1; else uniqueness_identifier3=uniqueness_identifier2; run; proc sort data=temp1 out=temp2 nodupkey; by uniqueness_identifier3; quit; Below is what my actual data looks like and what I would need. I am stuck how to remove the duplicates when the client numbers are non-numeric. Any feedback will be greatly appreciate. data work.have_actual; infile cards expandtabs truncover; input manager client1 client2; cards; 6363 123A 45C6C 6363 45C6C 123A ; run; data work.want_actual; infile cards expandtabs truncover; input manager client1 client2; cards; 6363 123A 45C6C ; run;

Yegen · ‎07-12-2017

Thanks for this helpful explanation, @Shmuel. I see, so EOV notifies SAS that the last observation has been reached and that way, SAS moves to the next dataset, right? Thanks for all of your excellent explanations.

Yegen · ‎07-12-2017

Thank you so much for this really helpful explanation, @Tom. Your explanation makes a lot of sense and I have learned quite a lot from your post. I assume the "\combined*" part does the same as "work.combined:" in the data statement (assuming that all combined files are in the work library), but one needs to use * instead of : since it is an infile statement. What exactly does the "eov" statement? Does it skip the first line of the data (i.e., variable name)? Also, I could not understand why we need to use @ after input.

Yegen · ‎07-11-2017

This was very clear and helpful. I really appreciate your helpful explanations and help as always, @Shmuel.

Yegen · ‎07-11-2017

Also, one more quick clarification question. What exactly does the following do in the code? MISSOVER DSD lrecl=32767

Yegen · ‎07-11-2017

Sorry for the confusion, @Astounding. That was a simple typo that I will fix in my original question. The file names start with combined. Thanks for pointing it out.

Yegen · ‎07-11-2017

Thanks very much @Shmuel. Your suggestion has solved the error. After deleting the !, the macro worked. The reason why I have included ! is that when I manually imported the one of the samples, in the code that was displayed in the log file there was a ! after the file name and before the delimiter.

Online Status	Offline
Date Last Visited	‎06-01-2020 06:53 PM

Re: Rank Transformation with mean=0 and std=1

Re: Rank Transformation with mean=0 and std=1

Re: Rank Transformation with mean=0 and std=1

Rank Transformation with mean=0 and std=1

Re: Error when using &md

Error when using &md

Re: Is it possible to absorb fixed effects when using Proc Surveyreg?

Re: Compress empty space of all variables

Re: Compress empty space of all variables

Re: Compress empty space of all variables

Re: Rank Transformation with mean=0 and std=1

Re: Rank Transformation with mean=0 and std=1

Re: Compress empty space of all variables

Re: Compress empty space of all variables

Re: Is it possible to absorb fixed effects when using Proc Surveyreg?

Re: Is it possible to absorb fixed effects when using Proc Surveyreg?

Re: Is it possible to absorb fixed effects when using Proc Surveyreg?

Re: Compress empty space of all variables

Re: Assigning group identifiers after merger

Assigning group identifiers after merger

Re: Using PC SAS/CONNECT for WRDS

Re: Is it possible to absorb fixed effects when using Proc Surveyreg?

Re: Construct random pairs with constraint and within group

Is it possible to absorb fixed effects when using Proc Surveyreg?

Re: Construct random pairs with constraint and within group

Re: Construct random pairs with constraint and within group

Constructing random pairs with constraint and within group

Re: Remove duplicates (based on non-numeric identifier)

Remove duplicates (based on non-numeric identifier)

Re: Simple import macro error

Re: Simple import macro error

Re: Simple import macro error

Re: Simple import macro error

Re: Simple import macro error

Re: Simple import macro error