07-16-2017 11:41 AM
In the dataset i'm using, missing observations are coded as -1.
I am running a format step and not sure how to recode the -1...
I currently have low-<1='missig', but I am not sure if this is the right/best way to do this.
Any other tips?
Also, what exactly does the low-<1 do?
Thank you all in advance.
07-16-2017 01:34 PM
That depends on what you're trying to do, which you didn't actually state in your question.
Formats are used to control the display of a number. But 'Missing' is actually a character string, so are you tyring to map
-1 -> . (numeric missing is a period)
-1 -> Missing (word that will be in a report)
Because formats only the control the display it's not actually recoded but you usually don't need that. If you do need it recoded, use a PUT() to recode it to a character value.
07-16-2017 01:46 PM
It really depends on what you want to do. If you want to create a format for displaying the value then I am not sure why a range is needed.
proc format ; value neg1miss -1 = 'Missing' other = [best12.] ; run;
If the values are numbers (like a score or a measure) that you would want to use directly, say to report the mean score, then you should RECODE the data and not just use a FORMAT. That way SAS will ignore the missing values.
data want ; set have ; if var1 = -1 then var1=. ; run;
You could create an INFORMAT to transform the data as it is read from a source text file.
proc format ; invalue neg1miss '-1' = . other = [32.] ; run; data want ; input (var1-var2) (:neg1miss.); cards; 1 1 -1 2 3 -1 ; proc freq data=want; tables var1-var2 ; run;
07-16-2017 02:25 PM
I would suggest changing the data:
if var1=-1 then var1=.M;
if var2=-1 then var2=.M;
If you were to print the data you would see "M" instead of "missing", but that might be acceptable. But there are other advantages besides printing. Suppose you try to calculate statistics. Before you change the data:
proc means data=have;
where var1 ne -1;
The WHERE statement is required so that the missing VAR1 values don't get included in the calculations. But that means you can only calculate statistics for one variable at a time. If you included VAR2 in the VAR statement, the WHERE statement would throw out some nonmissing values for VAR2. So changing the data (-1 becoming .M) lets you calculate statistics for all variables at once:
proc means data=have;
var var1 var2;
A WHERE statement is not needed, because PROC MEANS automatically tosses missing values from the calculations and keeps all valid (nonmissing) values.