data cholesterol_1;
infile 'path/file_1' delimiter='09'x TRUNCOVER DSD firstobs=2 ;
length variant $70 minor_allele $2 minor_AF 8
expected_case_minor_AC 8 low_confidence_variant $5
n_complete_samples 8 AC 8 ytx 8
beta 8 se 8 tstat 8 pval 8
;
input variant--ytx ( beta se tstat pval ) (:??32.) ;
run;
data cholesterol_2(drop = variant);
set cholesterol_1;
chr = scan(variant, 1,':');
bp = scan(variant, 2,':');
mutant = scan(variant, 3,':');
orig = scan(variant, 4,':');
run;
data cholesterol_3(drop = chr_ bp_);
retain
chr bp mutant orig minor_allele minor_AF
expected_case_minor_AC low_confidence_variant
n_complete_samples AC ytx beta se tstat pval
;
set cholesterol_2(rename = (chr=chr_ bp=bp_));
chr = input(chr_,1.);
bp = input(bp_,8.);
run;
This is my code. and I would like to repeat this for generating final output table(cholesterol_3) with multiple raw data(which are in same path/directory).
I am pretty new to SAS and I would like to hear some general approaches/tips for creating function with this code.
SAS handles this well. Here's your original code, with slight changes:
%macro repeat (path);
data cholesterol_1;
infile "&path" delimiter='09'x TRUNCOVER DSD firstobs=2 ;
length variant $70 minor_allele $2 minor_AF 8
expected_case_minor_AC 8 low_confidence_variant $5
n_complete_samples 8 AC 8 ytx 8
beta 8 se 8 tstat 8 pval 8
;
input variant--ytx ( beta se tstat pval ) (:??32.) ;
run;
data cholesterol_2(drop = variant);
set cholesterol_1;
chr = scan(variant, 1,':');
bp = scan(variant, 2,':');
mutant = scan(variant, 3,':');
orig = scan(variant, 4,':');
run;
data cholesterol_3(drop = chr_ bp_);
retain
chr bp mutant orig minor_allele minor_AF
expected_case_minor_AC low_confidence_variant
n_complete_samples AC ytx beta se tstat pval
;
set cholesterol_2(rename = (chr=chr_ bp=bp_));
chr = input(chr_,1.);
bp = input(bp_,8.);
run;
%mend repeat;
Now you can use the code repeatedly in this way:
%repeat (path/file_1)
%repeat (path/file_2)
Notice that the changes are few ... there's encapsulating the statements with %macro and %mend, and changing the reference to the path to use doublequotes with a reference to the value you will supply: "&path"
It's up to you to sort out how to access CHOLESTEROL_3 at the right time, to get the right version of the data set.
SAS handles this well. Here's your original code, with slight changes:
%macro repeat (path);
data cholesterol_1;
infile "&path" delimiter='09'x TRUNCOVER DSD firstobs=2 ;
length variant $70 minor_allele $2 minor_AF 8
expected_case_minor_AC 8 low_confidence_variant $5
n_complete_samples 8 AC 8 ytx 8
beta 8 se 8 tstat 8 pval 8
;
input variant--ytx ( beta se tstat pval ) (:??32.) ;
run;
data cholesterol_2(drop = variant);
set cholesterol_1;
chr = scan(variant, 1,':');
bp = scan(variant, 2,':');
mutant = scan(variant, 3,':');
orig = scan(variant, 4,':');
run;
data cholesterol_3(drop = chr_ bp_);
retain
chr bp mutant orig minor_allele minor_AF
expected_case_minor_AC low_confidence_variant
n_complete_samples AC ytx beta se tstat pval
;
set cholesterol_2(rename = (chr=chr_ bp=bp_));
chr = input(chr_,1.);
bp = input(bp_,8.);
run;
%mend repeat;
Now you can use the code repeatedly in this way:
%repeat (path/file_1)
%repeat (path/file_2)
Notice that the changes are few ... there's encapsulating the statements with %macro and %mend, and changing the reference to the path to use doublequotes with a reference to the value you will supply: "&path"
It's up to you to sort out how to access CHOLESTEROL_3 at the right time, to get the right version of the data set.
The reply from @Astounding gives you the answer.
Just a note that the three data steps could (should really) be just one.
Here's an example of a single DATA step that could replace three DATA steps:
data cholesterol;
infile 'path/file_1' delimiter='09'x TRUNCOVER DSD firstobs=2 ;
length variant $70 minor_allele $2 minor_AF 8
expected_case_minor_AC 8 low_confidence_variant $5
n_complete_samples 8 AC 8 ytx 8
beta 8 se 8 tstat 8 pval 8
;
input variant--ytx ( beta se tstat pval ) (:??32.) ;
retain
chr bp mutant orig minor_allele minor_AF
expected_case_minor_AC low_confidence_variant
n_complete_samples AC ytx beta se tstat pval
;
chr_ = scan(variant, 1,':');
bp_ = scan(variant, 2,':');
mutant = scan(variant, 3,':');
orig = scan(variant, 4,':');
chr = input(chr_,1.);
bp = input(bp_,8.);
drop chr_ bp_;
run;
Here is how I did it.
data CHOLESTEROL;
* infile 'path/file_1' delimiter='09'x TRUNCOVER DSD firstobs=2 ;
%* set length and informat and output order;
informat CHR BP 8.
MUTANT ORIG $20.
MINOR_ALLELE $2.
MINOR_AF EXPECTED_CASE_MINOR_AC 8.
LOW_CONFIDENCE_VARIANT $5.
N_COMPLETE_SAMPLES AC YTX BETA SE TSTAT PVAL 8.
VARIANT $70. ;
%* read data in input order;
input VARIANT MINOR_ALLELE MINOR_AF
EXPECTED_CASE_MINOR_AC LOW_CONFIDENCE_VARIANT
N_COMPLETE_SAMPLES AC YTX ( BETA SE TSTAT PVAL ) (:?? 32.) ;
drop VARIANT ;
CHR = input(scan(VARIANT, 1,':'),?? 1.);
BP = input(scan(VARIANT, 2,':'),?? 8.);
MUTANT = scan(VARIANT, 3,':') ;
ORIG = scan(VARIANT, 4,':') ;
run;
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.