BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Demographer
Pyrite | Level 9

Hello,

In a dataset (I didn't create it myself), there is a variable "VarC" that is set as "character", because most values are such "AA, "AB, "BB", etc. However, for missing values, some are numbers, some are set as a dot (.) and some have nothing at all. I would like to create a variable VarA (numeric) where all those missing values are set as a dot.

 

I tried this

 

if VarC=VarB then VarA=0;

if VarC ne VarB then VarA=1;

if VarC in ('99', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '.', ' ') then VarA=.;

 

However, it results as this. Some of the '99' have not been transformed and I can't figure out why. Also, I have no idea how to select the dot values of VarC. I tried with '.' but it didn't pick them up.

 

Table of COUNTR1Ycar by migration
VarC VarA
. 0 1
  29997.7 0 0
. 0 0 4556.41
99 0 0 58.4293
10 4.40122 0 0
11 6.82265 0 0
12 3.34725 0 0
13 6.78349 0 0
14 5.68027 0 0
5 4.65671 0 0
6 12.3363 0 0
7 9.62901 0 0
8 4.04315 0 0
9 4.61319 0 0
99 4394.16 0 0
AD 0 0 2.3538
AE 0 0 1.16401
..
1 ACCEPTED SOLUTION

Accepted Solutions
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Its a good idea to post your test data in the form of a datastep so we don't have to try to figure out formats and lengths.  I have guessed below.  Compress() function has an extra parameter - see the SAS docs, and one of those parameters is:

K = keep

D = digits

So in my example I keep only the digits and the .:

data have;
  varc=".";output;
  varc="";output;
  varc="99"; output;
  varc="AD"; output;
run;

data want;
  set have;
  vara=input(compress(varc,".","kd"),best.);
  if vara=. then vara=0;
run;

I add the if in as you seem to want 0 rather than missing, however hard to say as your test data doesn't match your logic - no varb for instance.

View solution in original post

7 REPLIES 7
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Its a good idea to post your test data in the form of a datastep so we don't have to try to figure out formats and lengths.  I have guessed below.  Compress() function has an extra parameter - see the SAS docs, and one of those parameters is:

K = keep

D = digits

So in my example I keep only the digits and the .:

data have;
  varc=".";output;
  varc="";output;
  varc="99"; output;
  varc="AD"; output;
run;

data want;
  set have;
  vara=input(compress(varc,".","kd"),best.);
  if vara=. then vara=0;
run;

I add the if in as you seem to want 0 rather than missing, however hard to say as your test data doesn't match your logic - no varb for instance.

Demographer
Pyrite | Level 9

I'm not really sure how I can show you the data. The dataset has 4.5 millions rows.

 

I want that VarA=. for VarC = 99 (58.4293 in the table) and VarC=. (4556.41). Some of the VarC=99 have been correctly transformed.

 
 
RW9
Diamond | Level 26 RW9
Diamond | Level 26

I don't need to see all your data.  I need to see example data, in the form of a datastep, which demostrates exactly what you have.  And also example output of what you want.  As @PeterClemmensen has mentioned as well as me, we both see varb in your locig, but it is never described in your post.

 

It may be something simple, if varc is numeric then int() as sometimes there is a very small fraction haning on that you can't see.  If it is character as you state, then make sure you:

if strip(VarC) in ('99', '5', '6', '7...

As there could be spaces.  As stated, we are guessing what your data looks like, we can't tell structure from what you have posted.

Demographer
Pyrite | Level 9

Thanks, compress function worked.

PeterClemmensen
Tourmaline | Level 20

What is VarB in this context? 🙂

Demographer
Pyrite | Level 9

For the context, VarB is the actually country. VarC is the country one year before. VarA indicates if the country has changed or not.

Shmuel
Garnet | Level 18

Is VarC length 2 characters only ?

Is length(Varc) = length(VarB) ?

Are both variables same case, IE uppercase or lowercase ?

 

Try next code:

 

data want;

 set have;

      length VarA 3;  /* IE numeric - minimum length */

       if compress(VarC) = compress(VarB)  then VarA=0;   else VarA=1;

       if compress(VarC) in ('99', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '.', ' ') then VarA=.;

run;

 

You can try replace the compress function with strip function.

You even can try:

   varA = input(VarC,?? 2.) ;

   if varA=99 or (5 le varA le 14) then VarA = .;

 /* if calculated VarA is already missing no need to assign . to it */

 

Question: if VarC = VarB and both are 99 - would you like VarA=0 or VarA=. ?

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1382 views
  • 1 like
  • 4 in conversation