DATA Step, Macro, Functions and more

Transforming missing values

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 111
Accepted Solution

Transforming missing values

[ Edited ]

Hello,

In a dataset (I didn't create it myself), there is a variable "VarC" that is set as "character", because most values are such "AA, "AB, "BB", etc. However, for missing values, some are numbers, some are set as a dot (.) and some have nothing at all. I would like to create a variable VarA (numeric) where all those missing values are set as a dot.

 

I tried this

 

if VarC=VarB then VarA=0;

if VarC ne VarB then VarA=1;

if VarC in ('99', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '.', ' ') then VarA=.;

 

However, it results as this. Some of the '99' have not been transformed and I can't figure out why. Also, I have no idea how to select the dot values of VarC. I tried with '.' but it didn't pick them up.

 

Table of COUNTR1Ycar by migration
VarC VarA
. 0 1
  29997.7 0 0
. 0 0 4556.41
99 0 0 58.4293
10 4.40122 0 0
11 6.82265 0 0
12 3.34725 0 0
13 6.78349 0 0
14 5.68027 0 0
5 4.65671 0 0
6 12.3363 0 0
7 9.62901 0 0
8 4.04315 0 0
9 4.61319 0 0
99 4394.16 0 0
AD 0 0 2.3538
AE 0 0 1.16401
..

Accepted Solutions
Solution
‎12-21-2016 09:56 AM
Super User
Super User
Posts: 7,942

Re: Transforming missing values

Posted in reply to Demographer

Its a good idea to post your test data in the form of a datastep so we don't have to try to figure out formats and lengths.  I have guessed below.  Compress() function has an extra parameter - see the SAS docs, and one of those parameters is:

K = keep

D = digits

So in my example I keep only the digits and the .:

data have;
  varc=".";output;
  varc="";output;
  varc="99"; output;
  varc="AD"; output;
run;

data want;
  set have;
  vara=input(compress(varc,".","kd"),best.);
  if vara=. then vara=0;
run;

I add the if in as you seem to want 0 rather than missing, however hard to say as your test data doesn't match your logic - no varb for instance.

View solution in original post


All Replies
Solution
‎12-21-2016 09:56 AM
Super User
Super User
Posts: 7,942

Re: Transforming missing values

Posted in reply to Demographer

Its a good idea to post your test data in the form of a datastep so we don't have to try to figure out formats and lengths.  I have guessed below.  Compress() function has an extra parameter - see the SAS docs, and one of those parameters is:

K = keep

D = digits

So in my example I keep only the digits and the .:

data have;
  varc=".";output;
  varc="";output;
  varc="99"; output;
  varc="AD"; output;
run;

data want;
  set have;
  vara=input(compress(varc,".","kd"),best.);
  if vara=. then vara=0;
run;

I add the if in as you seem to want 0 rather than missing, however hard to say as your test data doesn't match your logic - no varb for instance.

Frequent Contributor
Posts: 111

Re: Transforming missing values

I'm not really sure how I can show you the data. The dataset has 4.5 millions rows.

 

I want that VarA=. for VarC = 99 (58.4293 in the table) and VarC=. (4556.41). Some of the VarC=99 have been correctly transformed.

 
 
Super User
Super User
Posts: 7,942

Re: Transforming missing values

Posted in reply to Demographer

I don't need to see all your data.  I need to see example data, in the form of a datastep, which demostrates exactly what you have.  And also example output of what you want.  As @draycut has mentioned as well as me, we both see varb in your locig, but it is never described in your post.

 

It may be something simple, if varc is numeric then int() as sometimes there is a very small fraction haning on that you can't see.  If it is character as you state, then make sure you:

if strip(VarC) in ('99', '5', '6', '7...

As there could be spaces.  As stated, we are guessing what your data looks like, we can't tell structure from what you have posted.

Frequent Contributor
Posts: 111

Re: Transforming missing values

Thanks, compress function worked.

PROC Star
Posts: 735

Re: Transforming missing values

Posted in reply to Demographer

What is VarB in this context? Smiley Happy

Frequent Contributor
Posts: 111

Re: Transforming missing values

For the context, VarB is the actually country. VarC is the country one year before. VarA indicates if the country has changed or not.

Trusted Advisor
Posts: 1,555

Re: Transforming missing values

Posted in reply to Demographer

Is VarC length 2 characters only ?

Is length(Varc) = length(VarB) ?

Are both variables same case, IE uppercase or lowercase ?

 

Try next code:

 

data want;

 set have;

      length VarA 3;  /* IE numeric - minimum length */

       if compress(VarC) = compress(VarB)  then VarA=0;   else VarA=1;

       if compress(VarC) in ('99', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '.', ' ') then VarA=.;

run;

 

You can try replace the compress function with strip function.

You even can try:

   varA = input(VarC,?? 2.) ;

   if varA=99 or (5 le varA le 14) then VarA = .;

 /* if calculated VarA is already missing no need to assign . to it */

 

Question: if VarC = VarB and both are 99 - would you like VarA=0 or VarA=. ?

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 276 views
  • 1 like
  • 4 in conversation