BookmarkSubscribeRSS Feed
MajaFerencakovic
Fluorite | Level 6

Dear friends,

 

I need your help with my problem.

I have genotype data set loking like this

 

animal_id;snp_1;snp_2;snp3 ... snp_600000

animal1;0;2;1...2

animal2;1;2;0...0

 

i should recode values under snp_1...snp_600000 that are now 0, 1 or 2 to AA, AB, BB and this is not an issue, but then I have to export txt file which would look like this

 

animal1;A;A;B;B;A;B...B;B

animal2;A;B;B;B;A;A...A;A

 

in other words from 600000 variables per animal i must get 1200000 (i know it is big...)

 

i guess that two arrays would fix a problem, but my files are huge and more elegant solution would be nice.

 

Thanks 

 

Example of input and desirable output is attached

 

 

5 REPLIES 5
Reeza
Super User

What about a format? 

 

Proc format;

value gene_fmt

0= 'A;A'

1 = 'A;B'

2= 'B;B';

run;

 

You can apply format in one line:

 

Format snp_1-snp_60000 gene_fmt.;

 

When you export your data specifying a delimiter of ;, the data will be output as you requested. 

 

edit: added missing period in format statement.

MajaFerencakovic
Fluorite | Level 6
I will give it a try and let you know
MajaFerencakovic
Fluorite | Level 6
It works nice on 10 variables example. I will now try to run the real thing
TomKari
Onyx | Level 15

Here's an option with transposing the data. You'll actually get 120,000 variables in your result dataset, which you can export using any of the SAS tools.

 

proc sort data=have;

by animal_id;

run;

proc transpose data=have out=transposed_have;

var snp_1-snp_60000;

by animal_id;

run;

data transposed_have2;

set transposed_have;

length newcol $1 newname $32;

if COL1 = 0

then do;

newcol = "A";

newname = cats(_NAME_, "a");

output;

newcol = "A";

newname = cats(_NAME_, "b");

output;

end;

else if COL1 = 1 then do;

newcol = "A";

newname = cats(_NAME_, "a");

output;

newcol = "B";

newname = cats(_NAME_, "b");

output;

end;

else if COL1 = 2 then do;

newcol = "B";

newname = cats(_NAME_, "a");

output;

newcol = "B";

newname = cats(_NAME_, "b");

output;

end;

drop COL1 _NAME_;

run;

proc transpose data=transposed_have2 out=want(drop=_NAME_);

var newcol;

by animal_id;

id newname;

run;

MajaFerencakovic
Fluorite | Level 6
thanks for the tip. Well It looks like I missed one zero. It is 600000 at the beginning.

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 1678 views
  • 0 likes
  • 3 in conversation