data cholesterol_1;
infile 'path/file_1' delimiter='09'x TRUNCOVER DSD firstobs=2 ;
length variant $70 minor_allele $2 minor_AF 8
expected_case_minor_AC 8 low_confidence_variant $5
n_complete_samples 8 AC 8 ytx 8
beta 8 se 8 tstat 8 pval 8
;
input variant--ytx ( beta se tstat pval ) (:??32.) ;
run;
data cholesterol_2(drop = variant);
set cholesterol_1;
chr = scan(variant, 1,':');
bp = scan(variant, 2,':');
mutant = scan(variant, 3,':');
orig = scan(variant, 4,':');
run;
data cholesterol_3(drop = chr_ bp_);
retain
chr bp mutant orig minor_allele minor_AF
expected_case_minor_AC low_confidence_variant
n_complete_samples AC ytx beta se tstat pval
;
set cholesterol_2(rename = (chr=chr_ bp=bp_));
chr = input(chr_,1.);
bp = input(bp_,8.);
run;
This is my code. and I would like to repeat this for generating final output table(cholesterol_3) with multiple raw data(which are in same path/directory).
I am pretty new to SAS and I would like to hear some general approaches/tips for creating function with this code.
SAS handles this well. Here's your original code, with slight changes:
%macro repeat (path);
data cholesterol_1;
infile "&path" delimiter='09'x TRUNCOVER DSD firstobs=2 ;
length variant $70 minor_allele $2 minor_AF 8
expected_case_minor_AC 8 low_confidence_variant $5
n_complete_samples 8 AC 8 ytx 8
beta 8 se 8 tstat 8 pval 8
;
input variant--ytx ( beta se tstat pval ) (:??32.) ;
run;
data cholesterol_2(drop = variant);
set cholesterol_1;
chr = scan(variant, 1,':');
bp = scan(variant, 2,':');
mutant = scan(variant, 3,':');
orig = scan(variant, 4,':');
run;
data cholesterol_3(drop = chr_ bp_);
retain
chr bp mutant orig minor_allele minor_AF
expected_case_minor_AC low_confidence_variant
n_complete_samples AC ytx beta se tstat pval
;
set cholesterol_2(rename = (chr=chr_ bp=bp_));
chr = input(chr_,1.);
bp = input(bp_,8.);
run;
%mend repeat;
Now you can use the code repeatedly in this way:
%repeat (path/file_1)
%repeat (path/file_2)
Notice that the changes are few ... there's encapsulating the statements with %macro and %mend, and changing the reference to the path to use doublequotes with a reference to the value you will supply: "&path"
It's up to you to sort out how to access CHOLESTEROL_3 at the right time, to get the right version of the data set.
SAS handles this well. Here's your original code, with slight changes:
%macro repeat (path);
data cholesterol_1;
infile "&path" delimiter='09'x TRUNCOVER DSD firstobs=2 ;
length variant $70 minor_allele $2 minor_AF 8
expected_case_minor_AC 8 low_confidence_variant $5
n_complete_samples 8 AC 8 ytx 8
beta 8 se 8 tstat 8 pval 8
;
input variant--ytx ( beta se tstat pval ) (:??32.) ;
run;
data cholesterol_2(drop = variant);
set cholesterol_1;
chr = scan(variant, 1,':');
bp = scan(variant, 2,':');
mutant = scan(variant, 3,':');
orig = scan(variant, 4,':');
run;
data cholesterol_3(drop = chr_ bp_);
retain
chr bp mutant orig minor_allele minor_AF
expected_case_minor_AC low_confidence_variant
n_complete_samples AC ytx beta se tstat pval
;
set cholesterol_2(rename = (chr=chr_ bp=bp_));
chr = input(chr_,1.);
bp = input(bp_,8.);
run;
%mend repeat;
Now you can use the code repeatedly in this way:
%repeat (path/file_1)
%repeat (path/file_2)
Notice that the changes are few ... there's encapsulating the statements with %macro and %mend, and changing the reference to the path to use doublequotes with a reference to the value you will supply: "&path"
It's up to you to sort out how to access CHOLESTEROL_3 at the right time, to get the right version of the data set.
The reply from @Astounding gives you the answer.
Just a note that the three data steps could (should really) be just one.
Here's an example of a single DATA step that could replace three DATA steps:
data cholesterol;
infile 'path/file_1' delimiter='09'x TRUNCOVER DSD firstobs=2 ;
length variant $70 minor_allele $2 minor_AF 8
expected_case_minor_AC 8 low_confidence_variant $5
n_complete_samples 8 AC 8 ytx 8
beta 8 se 8 tstat 8 pval 8
;
input variant--ytx ( beta se tstat pval ) (:??32.) ;
retain
chr bp mutant orig minor_allele minor_AF
expected_case_minor_AC low_confidence_variant
n_complete_samples AC ytx beta se tstat pval
;
chr_ = scan(variant, 1,':');
bp_ = scan(variant, 2,':');
mutant = scan(variant, 3,':');
orig = scan(variant, 4,':');
chr = input(chr_,1.);
bp = input(bp_,8.);
drop chr_ bp_;
run;
Here is how I did it.
data CHOLESTEROL;
* infile 'path/file_1' delimiter='09'x TRUNCOVER DSD firstobs=2 ;
%* set length and informat and output order;
informat CHR BP 8.
MUTANT ORIG $20.
MINOR_ALLELE $2.
MINOR_AF EXPECTED_CASE_MINOR_AC 8.
LOW_CONFIDENCE_VARIANT $5.
N_COMPLETE_SAMPLES AC YTX BETA SE TSTAT PVAL 8.
VARIANT $70. ;
%* read data in input order;
input VARIANT MINOR_ALLELE MINOR_AF
EXPECTED_CASE_MINOR_AC LOW_CONFIDENCE_VARIANT
N_COMPLETE_SAMPLES AC YTX ( BETA SE TSTAT PVAL ) (:?? 32.) ;
drop VARIANT ;
CHR = input(scan(VARIANT, 1,':'),?? 1.);
BP = input(scan(VARIANT, 2,':'),?? 8.);
MUTANT = scan(VARIANT, 3,':') ;
ORIG = scan(VARIANT, 4,':') ;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.