So here so further explanation of what I did: the comma in the from statement of sql means an INNER JOIN of the two datasets listed. Inner join return the variables youve asked for in the select statement (a. and b. indicate which dataset that piece comes from), where the two datasets match on the where criteria specified (in this case they have to match on industry year and within the roa range specified). If there are no matches then there will not be a row returned from the inner join (you could change to a LEFT JOIN (switch the "where" to an "on" and the comma to "left join") which would return at least one row for every observation in sample and multiple rows if there are multiple matches from totalcontrol (the single no match row would have the information from sample but empty fields for the totalcontrol variables). To answer the question about deleting observations is harder not knowing your data well. If they are exact matches then you might have duplicates in one of the source files based on the three criteria given. You can add the "distinct" keyword right after the select which should get rid of exact duplicates. I imagine most are not exact duplicates because I am assuming the range on the roa is allowing multiple matches. Hope that helps! EJ
... View more