BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
monona
Obsidian | Level 7
data cholesterol_1;
  infile 'path/file_1'  delimiter='09'x TRUNCOVER DSD firstobs=2 ;
  length variant $70 minor_allele $2 minor_AF 8
         expected_case_minor_AC 8 low_confidence_variant $5 
         n_complete_samples 8 AC 8 ytx 8
         beta 8 se 8 tstat 8 pval 8
  ;
  input variant--ytx ( beta se tstat pval ) (:??32.) ;
run;

data cholesterol_2(drop = variant);
	set cholesterol_1;
	chr    = scan(variant, 1,':');
	bp     = scan(variant, 2,':');
	mutant = scan(variant, 3,':');
	orig   = scan(variant, 4,':');
run;

data cholesterol_3(drop = chr_ bp_);
	retain
	 chr bp mutant orig minor_allele minor_AF 
	 expected_case_minor_AC low_confidence_variant  
	 n_complete_samples AC ytx beta se tstat pval 
	 ;

	set cholesterol_2(rename = (chr=chr_ bp=bp_));

	chr = input(chr_,1.);
	bp  = input(bp_,8.);
run;

This is my code. and I would like to repeat this for generating final output table(cholesterol_3) with multiple raw data(which are in same path/directory).

 

I am pretty new to SAS and I would like to hear some general approaches/tips for creating function with this code.

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

SAS handles this well.  Here's your original code, with slight changes:

 

%macro repeat (path);
data cholesterol_1;
  infile "&path" delimiter='09'x TRUNCOVER DSD firstobs=2 ;
  length variant $70 minor_allele $2 minor_AF 8
         expected_case_minor_AC 8 low_confidence_variant $5 
         n_complete_samples 8 AC 8 ytx 8
         beta 8 se 8 tstat 8 pval 8
  ;
  input variant--ytx ( beta se tstat pval ) (:??32.) ;
run;

data cholesterol_2(drop = variant);
	set cholesterol_1;
	chr    = scan(variant, 1,':');
	bp     = scan(variant, 2,':');
	mutant = scan(variant, 3,':');
	orig   = scan(variant, 4,':');
run;

data cholesterol_3(drop = chr_ bp_);
	retain
	 chr bp mutant orig minor_allele minor_AF 
	 expected_case_minor_AC low_confidence_variant  
	 n_complete_samples AC ytx beta se tstat pval 
	 ;

	set cholesterol_2(rename = (chr=chr_ bp=bp_));

	chr = input(chr_,1.);
	bp  = input(bp_,8.);
run;

%mend repeat;

Now you can use the code repeatedly in this way:

 

%repeat (path/file_1)

%repeat (path/file_2)

 

Notice that the changes are few ... there's encapsulating the statements with %macro and %mend, and changing the reference to the path to use doublequotes with a reference to the value you will supply:  "&path"

 

It's up to you to sort out how to access CHOLESTEROL_3 at the right time, to get the right version of the data set.

View solution in original post

5 REPLIES 5
Astounding
PROC Star

SAS handles this well.  Here's your original code, with slight changes:

 

%macro repeat (path);
data cholesterol_1;
  infile "&path" delimiter='09'x TRUNCOVER DSD firstobs=2 ;
  length variant $70 minor_allele $2 minor_AF 8
         expected_case_minor_AC 8 low_confidence_variant $5 
         n_complete_samples 8 AC 8 ytx 8
         beta 8 se 8 tstat 8 pval 8
  ;
  input variant--ytx ( beta se tstat pval ) (:??32.) ;
run;

data cholesterol_2(drop = variant);
	set cholesterol_1;
	chr    = scan(variant, 1,':');
	bp     = scan(variant, 2,':');
	mutant = scan(variant, 3,':');
	orig   = scan(variant, 4,':');
run;

data cholesterol_3(drop = chr_ bp_);
	retain
	 chr bp mutant orig minor_allele minor_AF 
	 expected_case_minor_AC low_confidence_variant  
	 n_complete_samples AC ytx beta se tstat pval 
	 ;

	set cholesterol_2(rename = (chr=chr_ bp=bp_));

	chr = input(chr_,1.);
	bp  = input(bp_,8.);
run;

%mend repeat;

Now you can use the code repeatedly in this way:

 

%repeat (path/file_1)

%repeat (path/file_2)

 

Notice that the changes are few ... there's encapsulating the statements with %macro and %mend, and changing the reference to the path to use doublequotes with a reference to the value you will supply:  "&path"

 

It's up to you to sort out how to access CHOLESTEROL_3 at the right time, to get the right version of the data set.

ChrisNZ
Tourmaline | Level 20

The reply from @Astounding gives you the answer.

Just a note that the three data steps could (should really) be just one.

monona
Obsidian | Level 7
Actually @Astounding 's solution works. Could you explain why should that be a single data step? I frankly have no idea how to make them one data step.
Astounding
PROC Star

Here's an example of a single DATA step that could replace three DATA steps:

 


data cholesterol;
  infile 'path/file_1'  delimiter='09'x TRUNCOVER DSD firstobs=2 ;
  length variant $70 minor_allele $2 minor_AF 8
         expected_case_minor_AC 8 low_confidence_variant $5 
         n_complete_samples 8 AC 8 ytx 8
         beta 8 se 8 tstat 8 pval 8
  ;
  input variant--ytx ( beta se tstat pval ) (:??32.) ;

	retain
	 chr bp mutant orig minor_allele minor_AF 
	 expected_case_minor_AC low_confidence_variant  
	 n_complete_samples AC ytx beta se tstat pval 
	 ;
chr_ = scan(variant, 1,':'); bp_ = scan(variant, 2,':'); mutant = scan(variant, 3,':'); orig = scan(variant, 4,':'); chr = input(chr_,1.); bp = input(bp_,8.); drop chr_ bp_; run;

 

VRKiwi
Obsidian | Level 7

Here is how I did it.

data CHOLESTEROL;
 * infile 'path/file_1'  delimiter='09'x TRUNCOVER DSD firstobs=2 ;
 
  %* set length and informat and output order;
  informat CHR BP 8.
           MUTANT ORIG $20. 
           MINOR_ALLELE $2. 
           MINOR_AF EXPECTED_CASE_MINOR_AC 8. 
           LOW_CONFIDENCE_VARIANT  $5.
           N_COMPLETE_SAMPLES AC YTX BETA SE TSTAT PVAL 8. 
           VARIANT $70. ;
           
  %* read data in input order;       
  input  VARIANT MINOR_ALLELE MINOR_AF 
         EXPECTED_CASE_MINOR_AC LOW_CONFIDENCE_VARIANT 
         N_COMPLETE_SAMPLES  AC YTX ( BETA SE TSTAT PVAL ) (:?? 32.) ; 
         
  drop VARIANT ; 
  
  CHR    = input(scan(VARIANT, 1,':'),?? 1.); 
  BP     = input(scan(VARIANT, 2,':'),?? 8.);
  MUTANT =       scan(VARIANT, 3,':')       ;
  ORIG   =       scan(VARIANT, 4,':')       ;
run;

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1575 views
  • 3 likes
  • 4 in conversation