BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Mscarboncopy
Pyrite | Level 9

I am creating variables using variable X with a few if then statements.

My purpose with this is to have variables A B C D instead of A1 A2 A3 etc  B1 B2 B3  etc. and while this works beautifully, I am losing data for the first record of each ID.

I am not sure how to prevent this from happening.

if X = 1 then do;

A= oldA1;

B = oldB1;

C = oldC1;

D = oldD1;

end;

 else if X = 2 then do;

A= oldA2;

B = oldB2;

C = oldC2;

D = oldD2;

end;

else if X = 3 then do;

A= oldA3;

B = oldB3;

C = oldC3;

D = oldD3;

end;

 

this goes to 17. I am using only 3 to make the example easy.

 First record is not necessarily X = 1 it can be any key. I could do this with proc transpose but I have a few groups of variables and when I tried with proc transpose it made it worse.  How to tell sas to not drop the first record? I can’t seem to find what that is.

Example of the raw data I have:

 

ID    X   oldA1 oldB1 oldC1 oldD1     oldA2 oldB2 oldC2 oldD2           oldA3 oldB3 oldC3 oldD3

1     2        .         .         .          .          date        0          1       0

1     3      .           .         .          .             .            .          .         .            date           1      1         1

1     1      date     0        1         1            .            .          .        .                    .            .        .          .

1     3       .             .        .            .           .             .           .         .         date           0     0          0

2     3       .             .         .          .            .              .         .        .           date            0      0        1

2     1     date     1         1         1           .               .         .        .                 .               .       .         .

 

I needed:

 

ID   X         A           B   C  D

1       2    date         0    1  0

1       3    date         1   1    1

1       1    date         0   1    1

1       3     date       0    0   0

2       3    date        0     0    1

2        1    date       1    1    1

I am getting instead

ID   X         A           B   C  D

1       3     date        1   1    1

1       1     date        0   1    1

1       3     date        0    0   0

2       1     date        1    1    1

 

Thanks!

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

It's not 100% clear how you want to handle it when X is missing.  But this seems like a good approach, possibly a solution:

 

data want;
   set have;
   A = coalescec(oldA1, oldA2, oldA3);
   B = max(oldB1, oldB2, oldB3);
   C = max(oldC1, oldC2, oldC3);
   D = max(oldD1, oldD2, oldD3);
   drop old: ;
run;

Don't forget the colon in the DROP statement.

View solution in original post

14 REPLIES 14
Tom
Super User Tom
Super User

The snippet of code you posted in not deleting any observations.  And nothing in your posted code is dropping any variables.

Mscarboncopy
Pyrite | Level 9

I checked my raw data file and noticed that all of the ids have missing at least once before the first record for variable X. This is because the file is really big and it has a lot of other variables. So it is skipping the first one immediately after the missing. Not sure what to do in this case. 

ChrisNZ
Tourmaline | Level 20

> I checked my raw data file and noticed that all of the ids have missing at least once

I don't see missing IDs in the data you show.

 

.> Dropping first record 

I can't see how records could be dropped when the code you show contains no OUTPUT or DELETE statements.

Astounding
PROC Star

It's not 100% clear how you want to handle it when X is missing.  But this seems like a good approach, possibly a solution:

 

data want;
   set have;
   A = coalescec(oldA1, oldA2, oldA3);
   B = max(oldB1, oldB2, oldB3);
   C = max(oldC1, oldC2, oldC3);
   D = max(oldD1, oldD2, oldD3);
   drop old: ;
run;

Don't forget the colon in the DROP statement.

ChrisNZ
Tourmaline | Level 20

Another way:

 

A = sum( (X=1)*OLDA1 , (X=2)*OLDA2) , (X=3)*OLDA3 );

 

Mscarboncopy
Pyrite | Level 9

Thank you. This one still drops the first record. I will see what others suggested. I have another question as I like this better than my "if then statements". My date variable is missing. Do I need to do something else in order to prevent this from happening? I am getting this warning for var A which is date in the code: NOTE: Numeric values have been converted to character values at the places given by...

Thanks again.

Tom
Super User Tom
Super User

@Mscarboncopy wrote:

Thank you. This one still drops the first record. I will see what others suggested. I have another question as I like this better than my "if then statements". My date variable is missing. Do I need to do something else in order to prevent this from happening? I am getting this warning for var A which is date in the code: NOTE: Numeric values have been converted to character values at the places given by...

Thanks again.


You have not shown us your code so we really cannot give anything other than best practice suggestions.

Copy the text from the SAS log for the steps that are not working and paste it into your message using the Insert Code icon (looks like < / > ) to get a pop-up window.  Using the Insert Code window will preserve the formatting of the text.

Mscarboncopy
Pyrite | Level 9

Understood. I saved the instructions you gave for next time. I was able to find the issue in the previous code thanks to your suggestions. And I learned a more elegant way to do what I was doing with "if then statements", which always helps me with my long codes. However, I am not sure why the dates (var A)  are missing when I use the first suggestion though (from Astounding).  I added format A mmddyy10 at the beginning but it did not help. This part of the code is straightforward I am doing what Astounding suggested. My dates are formatted as dates in the raw file and the errors for only the date variables are: Numeric values have been converted to character values for the majority and Character values have been converted to numeric values.  I wonder if this is an issue with the raw data file? it can't be though because when I use "if then" statements I have no issue. So maybe I should just stick with those ? Thank you!

Tom
Super User Tom
Super User

@Mscarboncopy wrote:

Understood. I saved the instructions you gave for next time. I was able to find the issue in the previous code thanks to your suggestions. And I learned a more elegant way to do what I was doing with "if then statements", which always helps me with my long codes. However, I am not sure why the dates (var A)  are missing when I use the first suggestion though (from Astounding).  I added format A mmddyy10 at the beginning but it did not help.  What can I do? Thank you!


Context matters.

Are you talking about this line of code?

A = coalescec(oldA1, oldA2, oldA3);

If OLDA1 etc are numeric (SAS stores dates a number of days since 1960) instead of character then you will get both the automatic conversion notices and potentially missing values you talk about.  Change it to treat the A variables as numbers like it is doing with the B and C variables.

Mscarboncopy
Pyrite | Level 9

Perfect! I will accept his suggestion as solution, keeping in mind that I changed that part.

Thank you so much.  

Mscarboncopy
Pyrite | Level 9

Is there a similar function I could use to create a variable when the A's are char variables and when more than one can be selected?  This is another variable I am working on. 

If I do it this way, it overwrites.  VARWANTED = max (A1, A2, A3); I used coalescec but it did not work either.  "If then statements" give me the same issue.  Arrays also don't work.

array _A {*} A: ;
do VARWANTED =1 to dim(_A);
if _A{VARWANTED}>0 then output;
end;

Thank you.

Raw Data file looks like .

 

ID    A1        A2     A3

1     70       100      5

2     .            3        .

3    16        59       30

4    18           .             .

 

if I use the max function:

 

ID    VARWANTED

1          70

2           3

3          16

4          18

 

I needed to have

 

ID      VARWANTED

1              70

1              100             

1              5

2              3

3              16

3              59

3              30

4              18

 

 

 

 

Tom
Super User Tom
Super User

So you DON"T want to do the same thing as the MAX() function was doing.

Looks like you want to transpose your multiple variables into multiple observations.  PROC TRANSPOSE is designed to let you do that.

 

But you could also use your ARRAY to do that.

data want;
  set have;
  array _A A: ;
  do VARWANTED =1 to dim(_A);
     if _A[VARWANTED]>0 then do;
       A=_A[VARWANTED];
       output;
    end;
  end;
run;
Mscarboncopy
Pyrite | Level 9

Thank you! The array worked perfectly. 

 

mkeintz
PROC Star

Consider:

data want;
  set have;
  array  olda {17};
  array  oldb {17};
  array  oldc {17};
  array  oldd {17};
  if (1<=x<=17) then do;
    a=olda{x};
    b=oldb{x};
    c=oldc{x};
    d=oldd{x};
  end;
  drop old: ;
run;

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 14 replies
  • 1707 views
  • 7 likes
  • 5 in conversation