I have a CRSP dataset and one of the variables (RET) is RETURN which has a numeric format 11.6. However, there are observations that has character values( B and C character) for this variable ( which does not make sense). I cannot query to choose or remove these observations. I have tried " if Ret = 'C' then .... " and if Ret = C then ....." and don't. Is there a way to work around this?
Thanks,
1. Make sure it was imported correctly - so it would need to be a character variable. If it's not, then you've already lost that information.
If it is numeric and is showing values of B/C that likely means you're using a format on that variable or special missing values.
if this is the case, then try if Ret = .C or if missing(RET)
Check out the properties for the dataset, is it telling you that variable is really numeric? If so then:
where not missing(ret)...
Should work. If it is character, then what you have should work. If its got a format applied then check out what the format is. Really hard for me to guess without any information. Use this post to show some test data:
1. Make sure it was imported correctly - so it would need to be a character variable. If it's not, then you've already lost that information.
If it is numeric and is showing values of B/C that likely means you're using a format on that variable or special missing values.
if this is the case, then try if Ret = .C or if missing(RET)
This works. thanks
Check the data set properties to see what the actual format assigned is. If it is actually numeric someone may have assigned SAS special missing values. In some views they special value shows but is excluded for calculations.
data example; input numeric; datalines; 1 2 . .B ; run; Proc freq;run; proc print;run;
Have you done any research into why you see a "B" or "C" in a supposedly numeric field? That might be important to know.
At any rate, you will likely need RET to be numeric. If you can see a "B" or a "C", it's likely character. (If you are seeing "B" or "C" after using PROC IMPORT, a common tool with CRSP data, then it's definitely character. But PROC CONTENTS will confirm that.) So you will need to do something like this:
data want;
set have;
ret_numeric = input(ret, ??10.);
if ret_numeric = . then delete;
drop ret;
rename ret_numeric = ret;
run;
i used to download CRSP data in CSV format and then import it. But this time, I chose the dataset tobe in SAS dataset format and hence the issue. It turns out missing value can take other values too, not just "." so in this case it is .B and .C
So these are the special missing values I mentioned earlier, you will likely find some notes in the documentation on what they mean specifically, ie .B may mean no answer and .C may mean not asked (I made those up, check the docs!).
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
