Hello,
In a dataset (I didn't create it myself), there is a variable "VarC" that is set as "character", because most values are such "AA, "AB, "BB", etc. However, for missing values, some are numbers, some are set as a dot (.) and some have nothing at all. I would like to create a variable VarA (numeric) where all those missing values are set as a dot.
I tried this
if VarC=VarB then VarA=0;
if VarC ne VarB then VarA=1;
if VarC in ('99', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '.', ' ') then VarA=.;
However, it results as this. Some of the '99' have not been transformed and I can't figure out why. Also, I have no idea how to select the dot values of VarC. I tried with '.' but it didn't pick them up.
Table of COUNTR1Ycar by migration | |||
VarC | VarA | ||
. | 0 | 1 | |
29997.7 | 0 | 0 | |
. | 0 | 0 | 4556.41 |
99 | 0 | 0 | 58.4293 |
10 | 4.40122 | 0 | 0 |
11 | 6.82265 | 0 | 0 |
12 | 3.34725 | 0 | 0 |
13 | 6.78349 | 0 | 0 |
14 | 5.68027 | 0 | 0 |
5 | 4.65671 | 0 | 0 |
6 | 12.3363 | 0 | 0 |
7 | 9.62901 | 0 | 0 |
8 | 4.04315 | 0 | 0 |
9 | 4.61319 | 0 | 0 |
99 | 4394.16 | 0 | 0 |
AD | 0 | 0 | 2.3538 |
AE | 0 | 0 | 1.16401 |
… | .. | … | … |
Its a good idea to post your test data in the form of a datastep so we don't have to try to figure out formats and lengths. I have guessed below. Compress() function has an extra parameter - see the SAS docs, and one of those parameters is:
K = keep
D = digits
So in my example I keep only the digits and the .:
data have; varc=".";output; varc="";output; varc="99"; output; varc="AD"; output; run; data want; set have; vara=input(compress(varc,".","kd"),best.); if vara=. then vara=0; run;
I add the if in as you seem to want 0 rather than missing, however hard to say as your test data doesn't match your logic - no varb for instance.
Its a good idea to post your test data in the form of a datastep so we don't have to try to figure out formats and lengths. I have guessed below. Compress() function has an extra parameter - see the SAS docs, and one of those parameters is:
K = keep
D = digits
So in my example I keep only the digits and the .:
data have; varc=".";output; varc="";output; varc="99"; output; varc="AD"; output; run; data want; set have; vara=input(compress(varc,".","kd"),best.); if vara=. then vara=0; run;
I add the if in as you seem to want 0 rather than missing, however hard to say as your test data doesn't match your logic - no varb for instance.
I'm not really sure how I can show you the data. The dataset has 4.5 millions rows.
I want that VarA=. for VarC = 99 (58.4293 in the table) and VarC=. (4556.41). Some of the VarC=99 have been correctly transformed.
I don't need to see all your data. I need to see example data, in the form of a datastep, which demostrates exactly what you have. And also example output of what you want. As @PeterClemmensen has mentioned as well as me, we both see varb in your locig, but it is never described in your post.
It may be something simple, if varc is numeric then int() as sometimes there is a very small fraction haning on that you can't see. If it is character as you state, then make sure you:
if strip(VarC) in ('99', '5', '6', '7...
As there could be spaces. As stated, we are guessing what your data looks like, we can't tell structure from what you have posted.
Thanks, compress function worked.
What is VarB in this context? 🙂
For the context, VarB is the actually country. VarC is the country one year before. VarA indicates if the country has changed or not.
Is VarC length 2 characters only ?
Is length(Varc) = length(VarB) ?
Are both variables same case, IE uppercase or lowercase ?
Try next code:
data want;
set have;
length VarA 3; /* IE numeric - minimum length */
if compress(VarC) = compress(VarB) then VarA=0; else VarA=1;
if compress(VarC) in ('99', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '.', ' ') then VarA=.;
run;
You can try replace the compress function with strip function.
You even can try:
varA = input(VarC,?? 2.) ;
if varA=99 or (5 le varA le 14) then VarA = .;
/* if calculated VarA is already missing no need to assign . to it */
Question: if VarC = VarB and both are 99 - would you like VarA=0 or VarA=. ?
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.