Hi my dataset looks like this .
| Header 1 | Header 2 | Header 2 | 
|---|---|---|
| Firm name | Date | Market cap | 
| A | June-12 | |
| A | July-12 | 53 | 
| B | June-12 | 12 | 
| B | July-12 | 42 | 
| B | Aug-12 | 
I would like to tabulate like this using proc tabulate
| Header 1 | Header 2 | Header 2 | 
|---|---|---|
| Firm name | Total Obs in Market cap column | Missing Obs in Market cap column | 
| A | ||
| B | 
In the first table I have done some data manipulation using IF Then to avoid the missing values and have assigned the missing values as ' 0 ' . Please advise me whether I am better off undoing the manipulation and then use a " Proc Tabulate Missing " or I can do post the data manipulation.
Thanks in advance
If "Total Obs in Market cap column" is supposed to be a count of the non-missing before your manipulation you probably didn't need to. You don't give SAS variable names so it's hard to provide exact tabulate syntax by it looks like you may want something like:
Proc tabulate data=<your dataset name here>;
class FirmName;
class MarketCap / missing;
table FirmName, MarketCap=""*(n="Total Obs" nmiss="Missing Obs");
run;
From your description, I would guess that FirmName is a classification variable, and MarketCap an analysis variable.
If that's the case, you first have to understand what the MISSING option does. It only affects classification variables, and would have nothing to do with a missing MarketCap. By default, PROC TABULATE removes from the entire set of tables any observation having a missing value for a classification variable. Adding the MISSING option says to do the opposite, and keep missing values of a classification variable as legitimate observations to analyze. So if FirmName is your only classification variable, then the only impact of the MISSING option would be to restore rows that have a missing value for FirmName.
If you change missing values to 0, they are no longer missing. That would change table you produce, and your final column would always be 0. So you definitely have to "undo" your data manipulation to produce the final table. But why would you want to change missing to 0 anyway? Would you like to report a MarketCap of 0 when you have no information?
Good luck.
Thanks a ton Astounding , Yes FirmName is my only classification variable , as you said i will add MISSING option . Yes you are correct I will have to undo data manipulation before I tabulate. Actually the reason for manipulation was to help me for final analysis. The original idea behind my data manipulation was I had variables FirmName , MarketCapDate , MonthlyMarketCap, LaunchDate my logic was to know whether there are any firms whose Market Cap date > Launchdate and still MarketCap is missing so I wrote an IF THEN function
if MktCapDate >= LaunchDate then MktCap=MonthlyMktCap;
else if MktCapDate < LaunchDate then MktCap=0;
I guessed that if MktCapDate >= LaunchDate but if there are no values in MonthlyMktCap then these will be counted when I do MISSING option and on the contrary if MktCapDate < LaunchDate then those values would be ' 0 ' which anyway does not concern me because I wont bother about the MktCap value even before the firm is launched so these will get eliminated in my final analysis.
Please advise me whether my logic makes sense
It sounds like it would be more straightforward to ask directly for what you would like. You might want more than these two categories:
if MktCapDate > LaunchDate and MonthlyMktCap=. then result='? ';
else result='OK';
Then it would be easy to subset using a WHERE statement and RESULT.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
