Thank you. I'd like to summarize. Here are example datasets: subjects.sample (header, second record, and then 4 subject records, but my real data had more subjects) ID_1 ID_2 missing sex status 0 0 0 D B A123E123 A123E123 0 0 -9 A123F456 A123F456 0 0 -9 A123G789 A123G789 0 0 -9 A456G123 A456G123 0 0 -9 genotype.gen (single record, this is a truncated example but my real data had length 30,000 characters with many more triplets 19 rs123456 12345678 T G 0 0 1 0.873 0.127 0.002 0.252 0.746 0 0 1 0 My solution was to edit the above to get simple triplets editedgenotype.gen 0 0 1 0.873 0.127 0.002 0.252 0.746 0 0 1 0 data test0;
infile 'Q:\USERS\MEJones\temp\subjects.sample' dlm=' ' firstobs=3;
input ID_1 $ ID_2 $ missing $ sex $ status $;
run;
data test1;
infile 'Q:\USERS\MEJones\temp\editedgenotype.gen' dlm=' ' lrecl=43;
input P_AA P_AB P_BB @@;
run;
data all;
merge test0 test1;
run;
proc print data=all;
run; My output The SAS System Obs ID_1 ID_2 missing sex status P_AA P_AB P_BB1234 A123E123 A123E123 0 0 -9 0.000 0.000 1.000 A123F456 A123F456 0 0 -9 0.873 0.127 0.002 A123G789 A123G789 0 0 -9 0.252 0.746 0.000 A456G123 A456G123 0 0 -9 0.000 1.000 0.000 The solution from FreelanceReinhard Amethyst data test2;
infile 'Q:\USERS\MEJones\temp\genotype.gen' dlm=' ' lrecl=69 eof=finish ;
if _n_=1 then input @26 @;
input P_AA P_AB P_BB @@;
return;
finish: stop;
run;
data new;
merge test0 test2;
run;
proc print data=new;
run; Reads the original dataset with no need to edit it 😁 and produces The SAS System Obs ID_1 ID_2 missing sex status P_AA P_AB P_BB1234 A123E123 A123E123 0 0 -9 0.000 0.000 1.000 A123F456 A123F456 0 0 -9 0.873 0.127 0.002 A123G789 A123G789 0 0 -9 0.252 0.746 0.000 A456G123 A456G123 0 0 -9 0.000 1.000 0.000 NOTE: Copyright (c) 2002-2012 by SAS Institute Inc., Cary, NC, USA.
NOTE: SAS (r) Proprietary Software 9.4 (TS1M3)
Licensed to INSTITUTE OF CANCER RESEARCH, Site 70092393.
NOTE: This session is executing on the W32_8PRO platform.
NOTE: Updated analytical products:
SAS/STAT 14.1
SAS/ETS 14.1
SAS/OR 14.1
SAS/IML 14.1
SAS/QC 14.1
NOTE: Additional host information:
W32_8PRO WIN 6.2.9200 Workstation
NOTE: SAS initialization used:
real time 2.06 seconds
cpu time 0.71 seconds
1 data test0;
2 infile 'Q:\USERS\MEJones\temp\subjects.sample' dlm=' ' firstobs=3;
3 input ID_1 $ ID_2 $ missing $ sex $ status $;
4 run;
NOTE: The infile 'Q:\USERS\MEJones\temp\subjects.sample' is:
Filename=Q:\USERS\MEJones\temp\subjects.sample,
RECFM=V,LRECL=32767,File Size (bytes)=143,
Last Modified=27 August 2020 19:04:52 o'clock,
Create Time=27 August 2020 18:58:53 o'clock
NOTE: 4 records were read from the infile
'Q:\USERS\MEJones\temp\subjects.sample'.
The minimum record length was 24.
The maximum record length was 24.
NOTE: The data set WORK.TEST0 has 4 observations and 5 variables.
NOTE: DATA statement used (Total process time):
real time 0.03 seconds
cpu time 0.00 seconds
5
6 data test1;
7 infile 'Q:\USERS\MEJones\temp\editedgenotype.gen' dlm=' ' lrecl=43
7 ! ;
8 input P_AA P_AB P_BB @@;
9 run;
NOTE: The infile 'Q:\USERS\MEJones\temp\editedgenotype.gen' is:
Filename=Q:\USERS\MEJones\temp\editedgenotype.gen,
RECFM=V,LRECL=43,File Size (bytes)=45,
Last Modified=27 August 2020 19:04:29 o'clock,
Create Time=27 August 2020 19:04:29 o'clock
NOTE: 1 record was read from the infile
'Q:\USERS\MEJones\temp\editedgenotype.gen'.
The minimum record length was 43.
The maximum record length was 43.
NOTE: SAS went to a new line when INPUT statement reached past the end
of a line.
NOTE: The data set WORK.TEST1 has 4 observations and 3 variables.
NOTE: DATA statement used (Total process time):
real time 0.02 seconds
cpu time 0.01 seconds
10
11 data all;
12 merge test0 test1;
13 run;
NOTE: There were 4 observations read from the data set WORK.TEST0.
NOTE: There were 4 observations read from the data set WORK.TEST1.
NOTE: The data set WORK.ALL has 4 observations and 8 variables.
NOTE: DATA statement used (Total process time):
real time 0.02 seconds
cpu time 0.00 seconds
14
15 proc print data=all;
NOTE: Writing HTML Body file: sashtml.htm
16 run;
NOTE: There were 4 observations read from the data set WORK.ALL.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.34 seconds
cpu time 0.06 seconds
17
18
19 data test2;
20 infile 'Q:\USERS\MEJones\temp\genotype.gen' dlm=' ' lrecl=69
20 ! eof=finish ;
21 if _n_=1 then input @26 @;
22 input P_AA P_AB P_BB @@;
23 return;
24 finish: stop;
25 run;
NOTE: The infile 'Q:\USERS\MEJones\temp\genotype.gen' is:
Filename=Q:\USERS\MEJones\temp\genotype.gen,
RECFM=V,LRECL=69,File Size (bytes)=71,
Last Modified=27 August 2020 19:01:18 o'clock,
Create Time=27 August 2020 18:59:38 o'clock
NOTE: 1 record was read from the infile
'Q:\USERS\MEJones\temp\genotype.gen'.
The minimum record length was 69.
The maximum record length was 69.
NOTE: SAS went to a new line when INPUT statement reached past the end
of a line.
NOTE: The data set WORK.TEST2 has 4 observations and 3 variables.
NOTE: DATA statement used (Total process time):
real time 0.06 seconds
cpu time 0.03 seconds
26
27 data new;
28 merge test0 test2;
29 run;
NOTE: There were 4 observations read from the data set WORK.TEST0.
NOTE: There were 4 observations read from the data set WORK.TEST2.
NOTE: The data set WORK.NEW has 4 observations and 8 variables.
NOTE: DATA statement used (Total process time):
real time 0.02 seconds
cpu time 0.00 seconds
30 proc print data=new;
31 run;
NOTE: There were 4 observations read from the data set WORK.NEW.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
... View more