I have a file in which rows represent individual IDs and columns are genotype data for a given genetic variant. Below is a sample of the structure, but the overall data is ~300K IDs x ~400 variants. I need to recode the "0, 1, 2" numerical categories for any given variant (column) into corresponding genotypes. For example, I need to convert the 0, 1, 2 data of the column variable "rs12044597_G" into AA, GA, GG, where 0 = AA, 1 = GA and 2 = GG. ID rs12044597_G rs12711521_A 1 1 2 2 0 2 3 2 2 I have tried the following: data chr1_recode; set chr1; rs12044597 = .; if (rs12044597_G =0) then rs12044597 = "AA"; if (rs12044597_G =1) then rs12044597 = "GA"; if (rs12044597_G =2) then rs12044597 = "GG"; RUN; This does not work and results in large #s of errors, that look like this: NOTE: Character values have been converted to numeric values at the places given by: (Line):(Column). 76:56 76:101 76:146 NOTE: Invalid numeric data, 'GA' , at line 76 column 101. I realize that I am trying to convert numeric to character data, which is likely causing the issue. I'm familiar with the PUT statement for converting single data points from numeric to character, but how can this be incorporated into an if-then statement to achieve the data outcome I need, which should look like this: ID rs12044597_G rs12711521_A 1 GA AA 2 AA AA 3 GG AA 4 GG AA I'm using SAS University Edition. Thanks!
... View more