Hello,
In a dataset (I didn't create it myself), there is a variable "VarC" that is set as "character", because most values are such "AA, "AB, "BB", etc. However, for missing values, some are numbers, some are set as a dot (.) and some have nothing at all. I would like to create a variable VarA (numeric) where all those missing values are set as a dot.
I tried this
if VarC=VarB then VarA=0;
if VarC ne VarB then VarA=1;
if VarC in ('99', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '.', ' ') then VarA=.;
However, it results as this. Some of the '99' have not been transformed and I can't figure out why. Also, I have no idea how to select the dot values of VarC. I tried with '.' but it didn't pick them up.
Table of COUNTR1Ycar by migration | |||
VarC | VarA | ||
. | 0 | 1 | |
29997.7 | 0 | 0 | |
. | 0 | 0 | 4556.41 |
99 | 0 | 0 | 58.4293 |
10 | 4.40122 | 0 | 0 |
11 | 6.82265 | 0 | 0 |
12 | 3.34725 | 0 | 0 |
13 | 6.78349 | 0 | 0 |
14 | 5.68027 | 0 | 0 |
5 | 4.65671 | 0 | 0 |
6 | 12.3363 | 0 | 0 |
7 | 9.62901 | 0 | 0 |
8 | 4.04315 | 0 | 0 |
9 | 4.61319 | 0 | 0 |
99 | 4394.16 | 0 | 0 |
AD | 0 | 0 | 2.3538 |
AE | 0 | 0 | 1.16401 |
… | .. | … | … |
Its a good idea to post your test data in the form of a datastep so we don't have to try to figure out formats and lengths. I have guessed below. Compress() function has an extra parameter - see the SAS docs, and one of those parameters is:
K = keep
D = digits
So in my example I keep only the digits and the .:
data have; varc=".";output; varc="";output; varc="99"; output; varc="AD"; output; run; data want; set have; vara=input(compress(varc,".","kd"),best.); if vara=. then vara=0; run;
I add the if in as you seem to want 0 rather than missing, however hard to say as your test data doesn't match your logic - no varb for instance.
Its a good idea to post your test data in the form of a datastep so we don't have to try to figure out formats and lengths. I have guessed below. Compress() function has an extra parameter - see the SAS docs, and one of those parameters is:
K = keep
D = digits
So in my example I keep only the digits and the .:
data have; varc=".";output; varc="";output; varc="99"; output; varc="AD"; output; run; data want; set have; vara=input(compress(varc,".","kd"),best.); if vara=. then vara=0; run;
I add the if in as you seem to want 0 rather than missing, however hard to say as your test data doesn't match your logic - no varb for instance.
I'm not really sure how I can show you the data. The dataset has 4.5 millions rows.
I want that VarA=. for VarC = 99 (58.4293 in the table) and VarC=. (4556.41). Some of the VarC=99 have been correctly transformed.
I don't need to see all your data. I need to see example data, in the form of a datastep, which demostrates exactly what you have. And also example output of what you want. As @PeterClemmensen has mentioned as well as me, we both see varb in your locig, but it is never described in your post.
It may be something simple, if varc is numeric then int() as sometimes there is a very small fraction haning on that you can't see. If it is character as you state, then make sure you:
if strip(VarC) in ('99', '5', '6', '7...
As there could be spaces. As stated, we are guessing what your data looks like, we can't tell structure from what you have posted.
Thanks, compress function worked.
What is VarB in this context? 🙂
For the context, VarB is the actually country. VarC is the country one year before. VarA indicates if the country has changed or not.
Is VarC length 2 characters only ?
Is length(Varc) = length(VarB) ?
Are both variables same case, IE uppercase or lowercase ?
Try next code:
data want;
set have;
length VarA 3; /* IE numeric - minimum length */
if compress(VarC) = compress(VarB) then VarA=0; else VarA=1;
if compress(VarC) in ('99', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '.', ' ') then VarA=.;
run;
You can try replace the compress function with strip function.
You even can try:
varA = input(VarC,?? 2.) ;
if varA=99 or (5 le varA le 14) then VarA = .;
/* if calculated VarA is already missing no need to assign . to it */
Question: if VarC = VarB and both are 99 - would you like VarA=0 or VarA=. ?
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.