Hello, I am relatively new to SAS and I am working with microarray SNP data, fairly large. I need to reshape my date from long to wide. I tried the code below on a smaller data set and works well. An example data set is shown below: Data MAlong ID SNPs Genotype 23456 rs1234 CC 23456 rs1235 CC 23456 rs1236 TT 23456 rs1237 AA 23456 rs1238 TT 23456 rs1239 GG 23456 rs1240 GG 23456 rs1241 TT 23456 rs1242 CC 23456 rs1243 AA 17235 rs1234 TT 17235 rs1235 GG 17235 rs1236 TT 17235 rs1237 CC 17235 rs1238 AA 17235 rs1239 AA 17235 rs1240 AG 17235 rs1241 GG 17235 rs1242 GG 17235 rs1243 TC 25342 rs1234 AA 25342 rs1235 AG SAS code: PROC TRANSPOSE data = MAlong out= MAWide ; by ID not sorted; var Genotype ; ID SNPs; run; Resulting data set Data MAwide ID rs1234 rs1235 rs1236 rs1237 rs1238 rs1239 rs1240 rs1241 rs1242 rs1243 23456 CC CC TT AA TT GG GG TT CC AA 17235 TT GG TT CC AA AA AG GG GG TC 25342 AA AG AA AG TT CC -- GG GG GG For the larger data set, I am getting duplicate error by the ID SNPS that I am using. I know the SNP names are not duplicates, they are slightly different at the tale end of name but it seems SAS is assuming they are duplicates early. How can I make my script ignore duplicates? Using nodupkeys does not work as the SNP names are re-used for all SNPs. This is data from Illumina infinium array and their report lists genotypes per sample in long format, one at a time, and then start again. Thanks, Joy
... View more