About mspak

mspak · ‎09-24-2012

Hi Patrick, Thanks for your suggestion. I have no idea on how to code it as I am not familiar with hash object. My objective is to match all the firms (tem1.tic) with ONE matched firm (tic) at my best effort. The lg_asset is the firms size. I tried to matched firms with the smallest difference in firm size. The following is my code prior to the sample data I posted previously: data BENCH1(keep=fyear tic lg_asset sic1) BENCH0 (keep=fyear tic lg_asset sic1); set geodata.exfinance_match; if BENCH= 1 then output BENCH1; if BENCH= 0 then output BENCH0; run; proc sql; create table geodata.temp as select a.tic as TEM1_tic, a.fyear as tem1_fyear, b.*, abs(sum(a.LG_ASSET, -b.LG_ASSET)) as diffPS, min(calculated diffPS) as closest_size from BENCH1(where=(LG_ASSET is not missing)) a , BENCH0(where=(LG_ASSET is not missing)) b group by a.tic, a.fyear having diffPS = closest_size and a.sic1=b.sic1; quit; The data that I posted, after running the above programs; the firms, tem1.tic have been matched with tic based on the condition: diffPS = closest_size and a.sic1=b.sic1 grouped by a.tic, a.fyear. This is stage that I wish to ensure every tem1.tic firm has a matched tic (for a given year ). I have to ensure that there is no duplication, that is, a particular tic cannot be matched with more than one tem1.tic. Firms which have no (little) alternative(s) matched firm should be given a priority to be matched earlier. I hope that my explanation is clear now. Thank you. Regards, MSPAK

mspak · ‎09-24-2012

Dear all, I matched a set of company (tem1_tic) with another set of company (tic) by year, industry (sic) and closest_size (see attached). However, there are many matched firms for every tem1_tic. I wish only have one matched firm for each tem1.tic. I wish to select the first matched firms. However, a condition that I have to consider is that: some matched firms might be the matched firm of another tem1.tic. My question is that: how to ensure that I will have one unique matched firm for every tem1.tic with the following conditions? 1. the matched firm must not be selected as matched firm for another tem1.tic. 2. If there is the only (1) matched firm for a given tem1.tic, that matched firm must be selected as matched firm, in priority. I am thinking to solve the problem by sorting the data into the number of matched firms, then select the firms according to the priority condition, with firms with lowest number of matched firms should be matched earlier. I also wish to know firms that do not have any matched firm, as their match firm might have been a matched firm of another firms (in priority)? Thank you. Regards, Mspak

mspak · ‎09-22-2012

Thanks you for your suggestion. It does work. Regards, mspak

mspak · ‎09-20-2012

Dear Tom Kari, Thanks for your suggestion. I am not going to join with a joining key. I would like to match GD1 and GD0 with the closest_PS. I wish to pair GD1 and GD0 with their closest score (ie. closest_PS). Regards, MSPAK

mspak · ‎09-19-2012

Sorry for late reply. I was experiencing the problem to upload my SAS file. I have updated my query. Thanks for suggestions. Mspak

mspak · ‎09-18-2012

Dear all, I ran the following program (propensity score matching): proc sort data=test out=full ; by descending tem; run; proc probit data=FULL noprint; class TEM; model TEM = FREQ SHARES NUMEST BONUS_PERCENT1 BLACK_OPTION1 ROA1 LG_ASSET MB IND01-IND10 YeaR03 - YeaR09 / lackfit; output out=regdata_out xbeta=gammaw p=pred; run; data GD1 GD0; set regdata_out; if TEM = 1 then output GD1; if TEM = 0 then output GD0; run; proc sql; create table temp as select a.tic as TEM1_tic, a.fyear as tem1_fyear, b.*, abs(sum(a.pred, -b.pred)) as diffPS, min(calculated diffPS) as closest_PS from GD1 a , GD0 b group by a.tic, a.fyear having diffPS = closest_PS; quit; proc sql; create table temp_tem1 as select a.* from GD1 a, temp b where a.tic = b.TEM1_tic and a.fyear = b.tem1_fyear; quit; data matchpair; set temp_tem1 temp(drop = tem1_tic); run; But I could not get the output as it consumes a lot of processing time and a large storage. I am not sure whether there is a more efficient way to obtain the same output as the program above? Thank you. Regards, mspak

mspak · ‎04-29-2012

Thank you PG, It is the correct answer Regards, mspak

mspak · ‎04-29-2012

Good days to all, I have a SAS file with the following variables: hk_id = identification code for each IPO (Initial Public Offering) firm Industry = Industry for each firms underwriters_id = identification code for underwriters involved underwriters = Classified as either "MAIN UNDERWRITER" or "CO-UNDERWRITER" I wish to calculate the following variable: Number of IPOs in the same industry that the underwriters involved in the dataset = same_ind My expected output: hk_id Industry underwriters_id industry same_ind av_same_ind For example, the IPO firm with hk_id=2, Industry =Basic Materials; 3 underwriters involved ib002, ib003 and ib005. I would wish to know how many times each of the underwriter (ib002, ib003 and ib005) handled IPOs (based on the sample given)Basic Materials" industry (same_ind). Then, I will output the average times in which all the underwriters for each firm handled the IPOs in the same industry (av_same_ind). av_same_ind is calculated as sum of same_ind of all underwriters for each IPO firm/number of underwriters involved in the IPO. Thank you in advance for any advices. Regards, mspak

mspak · ‎04-27-2012

Hi Patrick and Ksharp, Thank you for letting me know about this special features of DataFlux. I am not sure whether the University has subscribed for the SAS Data Quality Server. I will check with the local SAS representative on this matter. Perhaps, I should submit a proposal for the subscription of DataFlux if it is not accessible by staff members here. Regards, mspak

mspak · ‎04-27-2012

Thank you very much. Today, I learned a lot from all of you. "There is no royal road to learning, learning SAS with this discussion forum is extremely useful and fun". Hope everyone enjoy your weekend :smileylaugh: Regards, mspak

mspak · ‎04-27-2012

Hi MikeZdeb, Your codes are extremely useful for me. I applied the codes suggested, I can say the result is excellent. In BTW, I found a good article on COMPGED Function (see the Pdf). Thank you very much for your helps. Regards, mspak

mspak · ‎04-27-2012

Hi Ksharp, What is the meaning of this code "where name =* _name" ? Thanks. Regards, mspak

mspak · ‎04-26-2012

Dear all, I downloaded data from 2 different databases with different their identification codes. As such, it is impossible to match cases by an ID. Therefore, the only way is to match the cases by using their company names which is an inexact character variable. I have 2 different datasets: A) uw_match, with the following variables: - underwriters_names; - holding_company; - ipo_date; - others..... B) maluw, with the following variables; - name (label as bank name) - bs_id_number (bankscope identification code) - closdate (Company Fiscal Year End) - others.... I wish to match-merge these two dataset by the following creteria: STEP 1: the underwriters (either underwriters_names or their holding_company) in dataset A compared to bank name in dataset B; note: I can either match-merge the underwriters_names (in A) with the bank name (in B) or holding_company (in A) with bank name (in B), the pairs with the higher matching accuracy level will be output/used; AND STEP 2: the closest closdate (in B) with ipo_date (in A). In short, I wish to match all the variables in A and B, by 1. the bank names (in B) = IPO underwriters (in A; either underwriters_names or their holding_company;which can provide the highest precision level) 2. in the similar period (ie. the closest fiscal year end of the banks with the IPO date). I read an article (see the pdf attached), and I understand that it is possible with SAS. But I feel little knowledge on how to apply the examples into my context here. Any comment and advise is much appreciated. Thank you. Regards, mspak

mspak · ‎04-26-2012

Hi Ksharp, Thank you for your program which help to obtain the first non-missing data and thank you so much for your frequent responses. I found that the missing values are missing for all years (from database). Therefore, I have to fix all the values manually by reading their annual reports. I am from Malaysia but my counter-party for this project is from Hong Kong. We hand-collected some data by employing research assistants both from Malaysia and HK. As such, HK party created their identification code for easy data consolidation. Regards, mspak

mspak · ‎04-25-2012

Hi SAS_Bigot, Thank you for your program. It is the first time I came across the coalesce function in SAS. Regards, mspak ,

Online Status	Offline
Date Last Visited	‎03-26-2017 10:35 AM

Re: Select most recent row with value

Re: Select most recent row with value

Re: Select most recent row with value

Select most recent row with value

Re: Optimal Lag length

Optimal Lag length

Creating zipcodes with FIPS codes

Re: Interpolation

Interpolation

Re: Proc Panel Warning message

Re: Select most recent row with value

Re: Select most recent row with value

Re: GMM using Proc Panel

Re: ZIPCODE FOR EACH COUNTY and INTERPOLATION

Re: Linear interpolation

Outlier detection

Re: AR test

Re: Combine Datasets using Inexact Character Variables in SAS

Combine Datasets using Inexact Character Variables in SAS

Re: Comparison between largest and second largest value

Re: Matching without replacement

Matching without replacement

Re: More efficient way

Re: More efficient way

Re: More efficient way

More efficient way

Re: Repeated times of each category

Repeated times of each category

Re: Combine Datasets using Inexact Character Variables in SAS

Re: Combine Datasets using Inexact Character Variables in SAS

Re: Combine Datasets using Inexact Character Variables in SAS

Re: Combine Datasets using Inexact Character Variables in SAS

Combine Datasets using Inexact Character Variables in SAS

Re: Find an outstanding amount after an event

Re: Find an outstanding amount after an event