I am creating variables using variable X with a few if then statements.
My purpose with this is to have variables A B C D instead of A1 A2 A3 etc B1 B2 B3 etc. and while this works beautifully, I am losing data for the first record of each ID.
I am not sure how to prevent this from happening.
if X = 1 then do;
A= oldA1;
B = oldB1;
C = oldC1;
D = oldD1;
end;
else if X = 2 then do;
A= oldA2;
B = oldB2;
C = oldC2;
D = oldD2;
end;
else if X = 3 then do;
A= oldA3;
B = oldB3;
C = oldC3;
D = oldD3;
end;
this goes to 17. I am using only 3 to make the example easy.
First record is not necessarily X = 1 it can be any key. I could do this with proc transpose but I have a few groups of variables and when I tried with proc transpose it made it worse. How to tell sas to not drop the first record? I can’t seem to find what that is.
Example of the raw data I have:
ID X oldA1 oldB1 oldC1 oldD1 oldA2 oldB2 oldC2 oldD2 oldA3 oldB3 oldC3 oldD3
1 2 . . . . date 0 1 0
1 3 . . . . . . . . date 1 1 1
1 1 date 0 1 1 . . . . . . . .
1 3 . . . . . . . . date 0 0 0
2 3 . . . . . . . . date 0 0 1
2 1 date 1 1 1 . . . . . . . .
I needed:
ID X A B C D
1 2 date 0 1 0
1 3 date 1 1 1
1 1 date 0 1 1
1 3 date 0 0 0
2 3 date 0 0 1
2 1 date 1 1 1
I am getting instead
ID X A B C D
1 3 date 1 1 1
1 1 date 0 1 1
1 3 date 0 0 0
2 1 date 1 1 1
Thanks!
It's not 100% clear how you want to handle it when X is missing. But this seems like a good approach, possibly a solution:
data want;
set have;
A = coalescec(oldA1, oldA2, oldA3);
B = max(oldB1, oldB2, oldB3);
C = max(oldC1, oldC2, oldC3);
D = max(oldD1, oldD2, oldD3);
drop old: ;
run;
Don't forget the colon in the DROP statement.
The snippet of code you posted in not deleting any observations. And nothing in your posted code is dropping any variables.
I checked my raw data file and noticed that all of the ids have missing at least once before the first record for variable X. This is because the file is really big and it has a lot of other variables. So it is skipping the first one immediately after the missing. Not sure what to do in this case.
> I checked my raw data file and noticed that all of the ids have missing at least once
I don't see missing IDs in the data you show.
.> Dropping first record
I can't see how records could be dropped when the code you show contains no OUTPUT or DELETE statements.
It's not 100% clear how you want to handle it when X is missing. But this seems like a good approach, possibly a solution:
data want;
set have;
A = coalescec(oldA1, oldA2, oldA3);
B = max(oldB1, oldB2, oldB3);
C = max(oldC1, oldC2, oldC3);
D = max(oldD1, oldD2, oldD3);
drop old: ;
run;
Don't forget the colon in the DROP statement.
Another way:
A = sum( (X=1)*OLDA1 , (X=2)*OLDA2) , (X=3)*OLDA3 );
Thank you. This one still drops the first record. I will see what others suggested. I have another question as I like this better than my "if then statements". My date variable is missing. Do I need to do something else in order to prevent this from happening? I am getting this warning for var A which is date in the code: NOTE: Numeric values have been converted to character values at the places given by...
Thanks again.
@Mscarboncopy wrote:
Thank you. This one still drops the first record. I will see what others suggested. I have another question as I like this better than my "if then statements". My date variable is missing. Do I need to do something else in order to prevent this from happening? I am getting this warning for var A which is date in the code: NOTE: Numeric values have been converted to character values at the places given by...
Thanks again.
You have not shown us your code so we really cannot give anything other than best practice suggestions.
Copy the text from the SAS log for the steps that are not working and paste it into your message using the Insert Code icon (looks like < / > ) to get a pop-up window. Using the Insert Code window will preserve the formatting of the text.
Understood. I saved the instructions you gave for next time. I was able to find the issue in the previous code thanks to your suggestions. And I learned a more elegant way to do what I was doing with "if then statements", which always helps me with my long codes. However, I am not sure why the dates (var A) are missing when I use the first suggestion though (from Astounding). I added format A mmddyy10 at the beginning but it did not help. This part of the code is straightforward I am doing what Astounding suggested. My dates are formatted as dates in the raw file and the errors for only the date variables are: Numeric values have been converted to character values for the majority and Character values have been converted to numeric values. I wonder if this is an issue with the raw data file? it can't be though because when I use "if then" statements I have no issue. So maybe I should just stick with those ? Thank you!
@Mscarboncopy wrote:
Understood. I saved the instructions you gave for next time. I was able to find the issue in the previous code thanks to your suggestions. And I learned a more elegant way to do what I was doing with "if then statements", which always helps me with my long codes. However, I am not sure why the dates (var A) are missing when I use the first suggestion though (from Astounding). I added format A mmddyy10 at the beginning but it did not help. What can I do? Thank you!
Context matters.
Are you talking about this line of code?
A = coalescec(oldA1, oldA2, oldA3);
If OLDA1 etc are numeric (SAS stores dates a number of days since 1960) instead of character then you will get both the automatic conversion notices and potentially missing values you talk about. Change it to treat the A variables as numbers like it is doing with the B and C variables.
Perfect! I will accept his suggestion as solution, keeping in mind that I changed that part.
Thank you so much.
Is there a similar function I could use to create a variable when the A's are char variables and when more than one can be selected? This is another variable I am working on.
If I do it this way, it overwrites. VARWANTED = max (A1, A2, A3); I used coalescec but it did not work either. "If then statements" give me the same issue. Arrays also don't work.
array _A {*} A: ;
do VARWANTED =1 to dim(_A);
if _A{VARWANTED}>0 then output;
end;
Thank you.
Raw Data file looks like .
ID A1 A2 A3
1 70 100 5
2 . 3 .
3 16 59 30
4 18 . .
if I use the max function:
ID VARWANTED
1 70
2 3
3 16
4 18
I needed to have
ID VARWANTED
1 70
1 100
1 5
2 3
3 16
3 59
3 30
4 18
So you DON"T want to do the same thing as the MAX() function was doing.
Looks like you want to transpose your multiple variables into multiple observations. PROC TRANSPOSE is designed to let you do that.
But you could also use your ARRAY to do that.
data want;
set have;
array _A A: ;
do VARWANTED =1 to dim(_A);
if _A[VARWANTED]>0 then do;
A=_A[VARWANTED];
output;
end;
end;
run;
Thank you! The array worked perfectly.
Consider:
data want;
set have;
array olda {17};
array oldb {17};
array oldc {17};
array oldd {17};
if (1<=x<=17) then do;
a=olda{x};
b=oldb{x};
c=oldc{x};
d=oldd{x};
end;
drop old: ;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.