About mspak

mspak · ‎03-17-2012

Thanks HaiKuo & Lilin, So sorry that I replaced the file name wrongly. I got the answer now . mspak

mspak · ‎03-17-2012

Hi Haikuo, The same answer generated even I replaced "eq" with ">=" As I have many variables with exact NMISS=69711, I don't think something wrong with "eq". mspak.

mspak · ‎03-17-2012

Hi again, I couldn't get the answer. I replaced the "WORK" with my libname "GEOG" and the file name "HAVE" with "USA". The criteria if NMISS=69711, then variables to be deleted was set. There must some errors but tehre is no error message in log. proc sql noprint; select catx(' ','nmiss(',name,') as',name) into : list separated by ',' from dictionary.columns where libname='GEOG' and memname='USA'; select count(*) into : nobs from GEOG.USA; create table temp as select &list from GEOG.USA; quit; data _null_; set geog.usa; length _list $ 4000; array _x{*} _numeric_; do i=1 to dim(_x); if _x{i} eq 69711 then _list=catx(' ',_list,vname(_x{i})); end; call symputx('drop',_list); run; data want; set GEOG.USA(drop=&drop); run; PROC MEANS DATA=WANT NMISS N; run; I checked the answer again using proc means, I found there are not deleted. Thank you. mspak

mspak · ‎03-17-2012

Thanks Linlin, I believe this is the way, but it seems more complicated than I obtained the list from excel. If there is a macro function to work out this steps, would be great!!! mspak

mspak · ‎03-17-2012

I did it by copy and paste the NMISS report into excel, then I filter them according my criteria. Lastly, I copy the list with the variables that to be deleted by using drop function. It works. But if there is an easy way via SAS program, will be great!!! Thanks. Regards, mspak

mspak · ‎03-17-2012

Hi all, I understand how to delete variable using "drop" function. However, I have a large data set with many variables, it is time consuming for me to drop them one-by-one. Imagine that by merely copy and paste the variable names also take an hour. I used proc means with an option of NMISS to view all the variables with their number of missing values. Some variables have very large number of missing values (even 100% missing). I would wish to delete these variables. I wish to gather opinions here on how to write a program that can drop variables with conditions that if NMISS of variables is more than a certain number, for example, 69710. Thank you. Regards, mspak

mspak · ‎03-17-2012

Hi RobertAllison, Thanks. I should eliminate all the zipcodes that are not included in sashelp.zipcode to avoid from having the same error messages. zipcitydistance function is ONLY for the zipcode downloaded from SAS. If I have a list of zipcodes with their longitude and latitude which are outside USA. Can I use the same function for distance calculations? Regards, mspak

mspak · ‎03-16-2012

Thanks FriedEggs again, From your first part of SQL: proc sql; create table foo as select ticker, year, sum(missing(postcode)) as total_miss, count(iddir) as total_dir, count(distinct iddir) as total_same, (calculated total_miss/calculated total_dir) as percent_miss from in.misingcode group by ticker,year; The count(iddir) as total_dir includes those duplicate IDDIR; but the count(distinct iddir) as total_same, eliminate the duplicates and provides the total of directors. I think your total_same should be the total_dir by my definition. I wish to eliminate duplicates IDDIR. My total_same definition is that total directors which have the same zipcode for a given firm-year, but this should excludes those have missing zipcodes. In other words, directors who have missing postcodes do not counted as same postcodes. Change_following is seems incorect: change_dir for year 2007 = Change_following for year 2006. I would like to see the number of changes for the current year compared to the following years. Perhaps for this information, I can present them as time-series data for years 2001 - 2010 in terms of total number of unique directors (by IDDIR) reported. Then I should detect the significant changes for a given year. The year with most significant change should be identified and then the number of changes should be reported only for the year with the highest changes. Is there any quick way or special function that deal with missing values by grouping them according to the category, such as firm-year as in my case? The dataset that I provided - the incomplete should be more in earlier years than later years. Therefore, my purpose to detect significant change of total number of directors - to imply that in same firms, the incomplete data might be large in earlier years before the significant change of number of directors. I am digesting your second part of the SQL in which 3 tables was used as source tables. I think in SQL, the lagging and leading figures can be written as year+1 and year-1. In DATA step, I used PROC EXPAND to do this. I am imagining the tables after left joins by the criteria that you set. SQL do need creativity in designing the program. Thank you for your helps. Note: How long did you take to master this skill? I am just 3 months old in terms of SAS program, and having attended SAS Prog1 and prog2 trainings. Regadrs, mspak

mspak · ‎03-16-2012

Dear all, I have SAS data including the following variables: Ticker (identification Code for company) IDDIR (directors' Identification Number) Postcode (directors' residential postal code) Year (Financial accounting year) miss_code (dummy variable=1 if there is missing Postcode; else=0) I wish to conduct an analysis on the reliability of the data. I might have to drop my analysis (or part of them) if the following conditions fulfilled: 1. Too many missing values on "postcode" for a given firm-year - MISSING VALUES). 2. Postal code are the similar for all/most the directors in the same firms-year - UNREASONABLE VALUES. 3. Number of directors in given year for a firm is significantly reduced compared to later years. For example, a firm migth have 10 directors in year 2010, but it is reported as 1 director in year 2003). This indicates that the dataset is not complete. Therefore, I would like to generate an output that contains the information in table as follows: Ticker Year total_miss total_dir total_same percent_miss percent_same change_dir percent_change change_following total_miss = total number of directors with missing postcode for a given firm-year total_dir = total number of directors for a given firm-year total_same = total number of directors with the same postcode (with non-missing postcode) for a given firm-year percent_miss = total_miss/total_dir for a given firm-year percent_same= total_same/total_dir for a given firm-year change_dir = increase or decrease of total_dir compared to prior year (t-1) = total_dir(t) - total_dir(t-1); t=time period percent_change (t)= change_dir/total_dir in a given year (t) = total_dir(t) - total_dir(t-1)/total_dir(t-1) for a given firm-year change_following (t) = percent_change (t+1) for the same ticker in the following one year For example, If the total directors in 2005=6 and directors in 2007 = 10. Then change_dir for year 2007 = 4 (or +4/6= +0.75). "Changes in following year" for 2006 means 4 (or 0.75) increases. By the end, I would wish to generate a report/table that indicate number/percent of observations (firm-years) by years: For each year (from 2001 - 2010): 1. Number (as well as in percent = number for each decile/total firms for given year) of firm with percent_miss ranging from 0% to 100% in 10% interval (in deciles) 2. Number (as well as in percent = number for each decile/total firms for given year) of firms with percent_same ranging from 0% to 100% in 10% interval (in deciles) 3. Number (as well as in percent) of firms with change_following which is ranging from 0% to 100% in 10% interval (in deciles) as well as an additonal category, which is named as "missing" (if there is no prior or following year data to facilitate calculations). I understand it is indeed difficult and seeking for any advises and helps. Thank you very much in advance. Regards, mspak

mspak · ‎03-16-2012

Thanks FriedEgg for confirmation. I am new to SAS and have used the program for merely 2 - 3 months. I have another question on the WHERE clause: Do you mean that if the WHERE clause of "a.tic ne c.tic" is not available, then the firms (identified by tic) will be self-matched? If I never indicated them in the WHERE clause as "a.tic=c.tic" would this also generate the same result? As per the following WHERE clause: a.zip=b.zip and c.zip=b.zip and you never indicate that a.zip=c.zip, does it indirectly indicate a.zip ne c.zip? In short, my question is that any difference if the "ne" indicated in WHERE clause compared to have no such a clause? Once again, thank you very much for helps. Regards, mspak

mspak · ‎03-14-2012

Thanks Ksharp, I need time to understand the program. I know that you used the SAS Hash object function, but it is new to me. Will post an update. Regards, mspak

mspak · ‎03-14-2012

Thank you for suggestion. I did download the SAS zipcode. Regards, mspak

mspak · ‎03-14-2012

Thanks FriedEgg, I am thinking what is the purpose to have the source tables of "comp" 2 times (ie. comp a and comp c)? Is it for the matching purpose. I can see you set the where clause - a.tic ne c.tic & a.sic3=c.sic3. Does the "a.tic ne c.tic" indirectly assume the "comp a" as the list of firms that need matching and "com c" as the firms to be matched. Then, matching criteria can be indirectly identified by the where clause "a.sic3=c.sic3". Processing time can be saved if we can set the criteria in the SQL wisely. SAS Hash object is also a high technology in SAS program. Thanks for sharing. I learnt something from you too Regards, MSPAK

mspak · ‎03-13-2012

Hi Hai.Kuo, Thanks and I will try again. I just download a few articles relating to thsi topics so that I can make use of this efficient function in future. Have a nice day!!! Time now at my place: 7.19pm Regards, MEI SEN

mspak · ‎03-13-2012

Thank you very much

Online Status	Offline
Date Last Visited	‎03-26-2017 10:35 AM

Re: Select most recent row with value

Re: Select most recent row with value

Re: Select most recent row with value

Select most recent row with value

Re: Optimal Lag length

Optimal Lag length

Creating zipcodes with FIPS codes

Re: Interpolation

Interpolation

Re: Proc Panel Warning message

Re: Select most recent row with value

Re: Select most recent row with value

Re: GMM using Proc Panel

Re: ZIPCODE FOR EACH COUNTY and INTERPOLATION

Re: Linear interpolation

Outlier detection

Re: AR test

Re: Combine Datasets using Inexact Character Variables in SAS

Combine Datasets using Inexact Character Variables in SAS

Re: Comparison between largest and second largest value

How to delete variable with large missing values

Re: How to delete variable with large missing values

Re: How to delete variable with large missing values

Re: How to delete variable with large missing values

Re: How to delete variable with large missing values

How to delete variable with large missing values

Invalid argument to function ZIPCITYDISTANCE Invalid argument to funct...

Re: Missing Value, Unreasonable Value, Incomplete Data Analysis

Missing Value, Unreasonable Value, Incomplete Data Analysis

Density of industrial firms

Need help to calculate a variable

Invalid argument to function ZIPCITYDISTANCE Invalid argument to funct...

Re: Density of industrial firms

Re: PROC SQL PROCESSING

Re: Density of industrial firms