## Transforming missing values

Solved
Frequent Contributor
Posts: 124

# Transforming missing values

[ Edited ]

Hello,

In a dataset (I didn't create it myself), there is a variable "VarC" that is set as "character", because most values are such "AA, "AB, "BB", etc. However, for missing values, some are numbers, some are set as a dot (.) and some have nothing at all. I would like to create a variable VarA (numeric) where all those missing values are set as a dot.

I tried this

if VarC=VarB then VarA=0;

if VarC ne VarB then VarA=1;

if VarC in ('99', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '.', ' ') then VarA=.;

However, it results as this. Some of the '99' have not been transformed and I can't figure out why. Also, I have no idea how to select the dot values of VarC. I tried with '.' but it didn't pick them up.

 Table of COUNTR1Ycar by migration VarC VarA . 0 1 29997.7 0 0 . 0 0 4556.41 99 0 0 58.4293 10 4.40122 0 0 11 6.82265 0 0 12 3.34725 0 0 13 6.78349 0 0 14 5.68027 0 0 5 4.65671 0 0 6 12.3363 0 0 7 9.62901 0 0 8 4.04315 0 0 9 4.61319 0 0 99 4394.16 0 0 AD 0 0 2.3538 AE 0 0 1.16401 … .. … …

Accepted Solutions
Solution
‎12-21-2016 09:56 AM
Super User
Posts: 9,599

## Re: Transforming missing values

Its a good idea to post your test data in the form of a datastep so we don't have to try to figure out formats and lengths.  I have guessed below.  Compress() function has an extra parameter - see the SAS docs, and one of those parameters is:

K = keep

D = digits

So in my example I keep only the digits and the .:

```data have;
varc=".";output;
varc="";output;
varc="99"; output;
run;

data want;
set have;
vara=input(compress(varc,".","kd"),best.);
if vara=. then vara=0;
run;```

I add the if in as you seem to want 0 rather than missing, however hard to say as your test data doesn't match your logic - no varb for instance.

All Replies
Solution
‎12-21-2016 09:56 AM
Super User
Posts: 9,599

## Re: Transforming missing values

Its a good idea to post your test data in the form of a datastep so we don't have to try to figure out formats and lengths.  I have guessed below.  Compress() function has an extra parameter - see the SAS docs, and one of those parameters is:

K = keep

D = digits

So in my example I keep only the digits and the .:

```data have;
varc=".";output;
varc="";output;
varc="99"; output;
run;

data want;
set have;
vara=input(compress(varc,".","kd"),best.);
if vara=. then vara=0;
run;```

I add the if in as you seem to want 0 rather than missing, however hard to say as your test data doesn't match your logic - no varb for instance.

Frequent Contributor
Posts: 124

## Re: Transforming missing values

I'm not really sure how I can show you the data. The dataset has 4.5 millions rows.

I want that VarA=. for VarC = 99 (58.4293 in the table) and VarC=. (4556.41). Some of the VarC=99 have been correctly transformed.

Super User
Posts: 9,599

## Re: Transforming missing values

I don't need to see all your data.  I need to see example data, in the form of a datastep, which demostrates exactly what you have.  And also example output of what you want.  As @draycut has mentioned as well as me, we both see varb in your locig, but it is never described in your post.

It may be something simple, if varc is numeric then int() as sometimes there is a very small fraction haning on that you can't see.  If it is character as you state, then make sure you:

if strip(VarC) in ('99', '5', '6', '7...

As there could be spaces.  As stated, we are guessing what your data looks like, we can't tell structure from what you have posted.

Frequent Contributor
Posts: 124

## Re: Transforming missing values

Thanks, compress function worked.

PROC Star
Posts: 1,269

## Re: Transforming missing values

What is VarB in this context?

Frequent Contributor
Posts: 124

## Re: Transforming missing values

For the context, VarB is the actually country. VarC is the country one year before. VarA indicates if the country has changed or not.

Posts: 1,837

## Re: Transforming missing values

Is VarC length 2 characters only ?

Is length(Varc) = length(VarB) ?

Are both variables same case, IE uppercase or lowercase ?

Try next code:

data want;

set have;

length VarA 3;  /* IE numeric - minimum length */

if compress(VarC) = compress(VarB)  then VarA=0;   else VarA=1;

if compress(VarC) in ('99', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '.', ' ') then VarA=.;

run;

You can try replace the compress function with strip function.

You even can try:

varA = input(VarC,?? 2.) ;

if varA=99 or (5 le varA le 14) then VarA = .;

/* if calculated VarA is already missing no need to assign . to it */

Question: if VarC = VarB and both are 99 - would you like VarA=0 or VarA=. ?

☑ This topic is solved.