Hi,
I am a complete novice to SAS and programming. I have used a set of code to create new variable in a dataset based on value of other variable in the same dataset and it worked:
libname lib "Mylib";
data lib.mydata;
set lib.mydata;
if WEEK<=1968 then GRP=1;
else GRP=2;
run;
proc print data=lib.mydata;
run;
here, based on the WEEK variable (numeric) values I have created a new numeric variable GRP with only two values 1 or 2
This actually worked and gave me desired column accurately. But when I tried to apply similar logic in another dataset its giving different output:
libname lib "mylib";
data lib.mydata2;
set lib.mydata2;
if VAR2="Tier 1" then VAR3=1;
else if VAR2="Tier 2" then VAR3=2;
else VAR3=3;
run;
proc print data=lib.mydata2;
run;
I even tried by changing this code to:
libname lib "mylib";
data lib.mydata2;
set lib.mydata2;
if VAR2="Tier 1" then VAR3=1;
else if VAR2="Tier 2" then VAR3=2;
else if VAR3= "Control" then VAR3=3;
run;
proc print data=lib.mydata2;
run;
Here I am trying to create VAR3(1,2,3) a numeric variable based on the the character variable VAR2 (Tier 1, Tier 2 and Control). Such that:
VAR2 | VAR3 |
Tier 1 | 1 |
Tier 2 | 2 |
Control | 3 |
But the output is taking VAR3=3 for all the observations and the values for VAR3 shifted in the output to the next row starting for each observation
The output of this code (.lst file) looks like the the data available in the text file attached here.
Please have a look and guide me why this is not working. Kindly help me with right code.
First
else if VAR2= "Control" then VAR3=3;
not
else if VAR3 = "Control" then VAR3=3;
Writing over your input dataset is a big source of confusion, especially for novice users. Doing that makes it impossible to change your logic and re-run the program because the input data is now modified.
Not sure what you are trying to show but the text file you posted (why did you post a text file instead of just posting the text into the body of the question using the Insert Code icon?).
Since you are overwriting your inputs probable what happened is that you ran it once and set VAR3 to 3 for every observation. Now when you run the code you posted non of the IF conditions are met so VAR3 keeps it current value of 3.
Make sure that the values of VAR2 actually match the values in your IF conditions. Is the case the same? Do the values in the dataset contain leading spaces? Or other invisible characters?
One of the mysteries you will need to solve is what is actually contained in VAR2? It may contain characters that you are not accounting for. Some of the many possibilities:
Here's a quick way to verify. Begin with:
data _null_;
tier_1_in_hex = put("Tier 1", $hex12.);
tier_2_in_hex = put("Tier 2", $hex12.);
tier_3_in_hex = put("Tier 3", $hex12.);
put 'Tier 1 should look like this: ' tier_1_in_hex;
put 'Tier 2 should look like this: ' tier_2_in_hex;
put 'Tier 3 should look like this: ' tier_3_in_hex;
run;
That tells you what the characters you expect would look like, if expressed in hex format.
That get a table of what is actually there, also in hex format:
proc freq data=lib.mydata;
tables var2;
format var2 $hex12.;
run;
Compare the results to verify whether they are the same or not. Let us know what you find.
hi,
There might be leading/trailing space for var2 values, so why you are getting all values as var3 = 3,
try running this code, this might help,
libname lib "mylib";
data lib.mydata2;
set lib.mydata2;
if strip(upcase(VAR2))="TIER 1" then VAR3=1;
else if strip(upcase(VAR2))="TIER 2" then VAR3=2;
else var3 = 3;
run;
regards
manoj.
Hi @Ronin,
Your DATA steps are syntactically correct and, somewhat ironically, even the one with the typo in the variable name would produce correct results (!) if the input dataset met certain plausible assumptions.
However, these assumptions must be violated to obtain the odd PROC PRINT output you posted.
SAS programmers around the globe are now curiously waiting for you to post (using the {i} button) the PROC FREQ output that @Astounding asked you to produce. Or, if you're more comfortable with PROC PRINT, please show us the output of this step (two observations):
proc print data=lib.mydata2(firstobs=30 obs=31);
format _character_ $hex24.;
run;
Then I'm sure the issue will be resolved very soon.
As long as you haven't actually done the work yet, here is a preliminary step you can take to make sure we are pursuing the right path here.
data temp;
set have;
len = length(var2);
run;
proc freq data=temp;
tables var2 * len / missing list;
run;
This step will confirm whether the problem lies in the data or was introduced by many failed experiments in trying to get the program to work.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.