SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

recode genotypes 0, 1, 2 to space delimited SNPs A A A B B B

Reply
Contributor
Posts: 40

recode genotypes 0, 1, 2 to space delimited SNPs A A A B B B

Dear friends,

 

I need your help with my problem.

I have genotype data set loking like this

 

animal_id;snp_1;snp_2;snp3 ... snp_600000

animal1;0;2;1...2

animal2;1;2;0...0

 

i should recode values under snp_1...snp_600000 that are now 0, 1 or 2 to AA, AB, BB and this is not an issue, but then I have to export txt file which would look like this

 

animal1;A;A;B;B;A;B...B;B

animal2;A;B;B;B;A;A...A;A

 

in other words from 600000 variables per animal i must get 1200000 (i know it is big...)

 

i guess that two arrays would fix a problem, but my files are huge and more elegant solution would be nice.

 

Thanks 

 

Example of input and desirable output is attached

 

 

Super User
Posts: 19,851

Re: recode genotypes 0, 1, 2 to space delimited SNPs A A A B B B

[ Edited ]
Posted in reply to MajaFerencakovic

What about a format? 

 

Proc format;

value gene_fmt

0= 'A;A'

1 = 'A;B'

2= 'B;B';

run;

 

You can apply format in one line:

 

Format snp_1-snp_60000 gene_fmt.;

 

When you export your data specifying a delimiter of ;, the data will be output as you requested. 

 

edit: added missing period in format statement.

Contributor
Posts: 40

Re: recode genotypes 0, 1, 2 to space delimited SNPs A A A B B B

I will give it a try and let you know
Contributor
Posts: 40

Re: recode genotypes 0, 1, 2 to space delimited SNPs A A A B B B

It works nice on 10 variables example. I will now try to run the real thing
PROC Star
Posts: 1,167

Re: recode genotypes 0, 1, 2 to space delimited SNPs A A A B B B

Posted in reply to MajaFerencakovic

Here's an option with transposing the data. You'll actually get 120,000 variables in your result dataset, which you can export using any of the SAS tools.

 

proc sort data=have;

by animal_id;

run;

proc transpose data=have out=transposed_have;

var snp_1-snp_60000;

by animal_id;

run;

data transposed_have2;

set transposed_have;

length newcol $1 newname $32;

if COL1 = 0

then do;

newcol = "A";

newname = cats(_NAME_, "a");

output;

newcol = "A";

newname = cats(_NAME_, "b");

output;

end;

else if COL1 = 1 then do;

newcol = "A";

newname = cats(_NAME_, "a");

output;

newcol = "B";

newname = cats(_NAME_, "b");

output;

end;

else if COL1 = 2 then do;

newcol = "B";

newname = cats(_NAME_, "a");

output;

newcol = "B";

newname = cats(_NAME_, "b");

output;

end;

drop COL1 _NAME_;

run;

proc transpose data=transposed_have2 out=want(drop=_NAME_);

var newcol;

by animal_id;

id newname;

run;

Contributor
Posts: 40

Re: recode genotypes 0, 1, 2 to space delimited SNPs A A A B B B

thanks for the tip. Well It looks like I missed one zero. It is 600000 at the beginning.
Ask a Question
Discussion stats
  • 5 replies
  • 369 views
  • 0 likes
  • 3 in conversation