SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

recode genotypes 0, 1, 2 to space delimited SNPs A A A B B B

Reply
Contributor
Posts: 40

recode genotypes 0, 1, 2 to space delimited SNPs A A A B B B

Dear friends,

 

I need your help with my problem.

I have genotype data set loking like this

 

animal_id;snp_1;snp_2;snp3 ... snp_600000

animal1;0;2;1...2

animal2;1;2;0...0

 

i should recode values under snp_1...snp_600000 that are now 0, 1 or 2 to AA, AB, BB and this is not an issue, but then I have to export txt file which would look like this

 

animal1;A;A;B;B;A;B...B;B

animal2;A;B;B;B;A;A...A;A

 

in other words from 600000 variables per animal i must get 1200000 (i know it is big...)

 

i guess that two arrays would fix a problem, but my files are huge and more elegant solution would be nice.

 

Thanks 

 

Example of input and desirable output is attached

 

 

Super User
Posts: 17,819

Re: recode genotypes 0, 1, 2 to space delimited SNPs A A A B B B

[ Edited ]

What about a format? 

 

Proc format;

value gene_fmt

0= 'A;A'

1 = 'A;B'

2= 'B;B';

run;

 

You can apply format in one line:

 

Format snp_1-snp_60000 gene_fmt.;

 

When you export your data specifying a delimiter of ;, the data will be output as you requested. 

 

edit: added missing period in format statement.

Contributor
Posts: 40

Re: recode genotypes 0, 1, 2 to space delimited SNPs A A A B B B

I will give it a try and let you know
Contributor
Posts: 40

Re: recode genotypes 0, 1, 2 to space delimited SNPs A A A B B B

It works nice on 10 variables example. I will now try to run the real thing
PROC Star
Posts: 1,091

Re: recode genotypes 0, 1, 2 to space delimited SNPs A A A B B B

Here's an option with transposing the data. You'll actually get 120,000 variables in your result dataset, which you can export using any of the SAS tools.

 

proc sort data=have;

by animal_id;

run;

proc transpose data=have out=transposed_have;

var snp_1-snp_60000;

by animal_id;

run;

data transposed_have2;

set transposed_have;

length newcol $1 newname $32;

if COL1 = 0

then do;

newcol = "A";

newname = cats(_NAME_, "a");

output;

newcol = "A";

newname = cats(_NAME_, "b");

output;

end;

else if COL1 = 1 then do;

newcol = "A";

newname = cats(_NAME_, "a");

output;

newcol = "B";

newname = cats(_NAME_, "b");

output;

end;

else if COL1 = 2 then do;

newcol = "B";

newname = cats(_NAME_, "a");

output;

newcol = "B";

newname = cats(_NAME_, "b");

output;

end;

drop COL1 _NAME_;

run;

proc transpose data=transposed_have2 out=want(drop=_NAME_);

var newcol;

by animal_id;

id newname;

run;

Contributor
Posts: 40

Re: recode genotypes 0, 1, 2 to space delimited SNPs A A A B B B

thanks for the tip. Well It looks like I missed one zero. It is 600000 at the beginning.
Ask a Question
Discussion stats
  • 5 replies
  • 365 views
  • 0 likes
  • 3 in conversation