BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
monona
Obsidian | Level 7
data cholesterol_1;
  infile 'path/file_1'  delimiter='09'x TRUNCOVER DSD firstobs=2 ;
  length variant $70 minor_allele $2 minor_AF 8
         expected_case_minor_AC 8 low_confidence_variant $5 
         n_complete_samples 8 AC 8 ytx 8
         beta 8 se 8 tstat 8 pval 8
  ;
  input variant--ytx ( beta se tstat pval ) (:??32.) ;
run;

data cholesterol_2(drop = variant);
	set cholesterol_1;
	chr    = scan(variant, 1,':');
	bp     = scan(variant, 2,':');
	mutant = scan(variant, 3,':');
	orig   = scan(variant, 4,':');
run;

data cholesterol_3(drop = chr_ bp_);
	retain
	 chr bp mutant orig minor_allele minor_AF 
	 expected_case_minor_AC low_confidence_variant  
	 n_complete_samples AC ytx beta se tstat pval 
	 ;

	set cholesterol_2(rename = (chr=chr_ bp=bp_));

	chr = input(chr_,1.);
	bp  = input(bp_,8.);
run;

This is my code. and I would like to repeat this for generating final output table(cholesterol_3) with multiple raw data(which are in same path/directory).

 

I am pretty new to SAS and I would like to hear some general approaches/tips for creating function with this code.

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

SAS handles this well.  Here's your original code, with slight changes:

 

%macro repeat (path);
data cholesterol_1;
  infile "&path" delimiter='09'x TRUNCOVER DSD firstobs=2 ;
  length variant $70 minor_allele $2 minor_AF 8
         expected_case_minor_AC 8 low_confidence_variant $5 
         n_complete_samples 8 AC 8 ytx 8
         beta 8 se 8 tstat 8 pval 8
  ;
  input variant--ytx ( beta se tstat pval ) (:??32.) ;
run;

data cholesterol_2(drop = variant);
	set cholesterol_1;
	chr    = scan(variant, 1,':');
	bp     = scan(variant, 2,':');
	mutant = scan(variant, 3,':');
	orig   = scan(variant, 4,':');
run;

data cholesterol_3(drop = chr_ bp_);
	retain
	 chr bp mutant orig minor_allele minor_AF 
	 expected_case_minor_AC low_confidence_variant  
	 n_complete_samples AC ytx beta se tstat pval 
	 ;

	set cholesterol_2(rename = (chr=chr_ bp=bp_));

	chr = input(chr_,1.);
	bp  = input(bp_,8.);
run;

%mend repeat;

Now you can use the code repeatedly in this way:

 

%repeat (path/file_1)

%repeat (path/file_2)

 

Notice that the changes are few ... there's encapsulating the statements with %macro and %mend, and changing the reference to the path to use doublequotes with a reference to the value you will supply:  "&path"

 

It's up to you to sort out how to access CHOLESTEROL_3 at the right time, to get the right version of the data set.

View solution in original post

5 REPLIES 5
Astounding
PROC Star

SAS handles this well.  Here's your original code, with slight changes:

 

%macro repeat (path);
data cholesterol_1;
  infile "&path" delimiter='09'x TRUNCOVER DSD firstobs=2 ;
  length variant $70 minor_allele $2 minor_AF 8
         expected_case_minor_AC 8 low_confidence_variant $5 
         n_complete_samples 8 AC 8 ytx 8
         beta 8 se 8 tstat 8 pval 8
  ;
  input variant--ytx ( beta se tstat pval ) (:??32.) ;
run;

data cholesterol_2(drop = variant);
	set cholesterol_1;
	chr    = scan(variant, 1,':');
	bp     = scan(variant, 2,':');
	mutant = scan(variant, 3,':');
	orig   = scan(variant, 4,':');
run;

data cholesterol_3(drop = chr_ bp_);
	retain
	 chr bp mutant orig minor_allele minor_AF 
	 expected_case_minor_AC low_confidence_variant  
	 n_complete_samples AC ytx beta se tstat pval 
	 ;

	set cholesterol_2(rename = (chr=chr_ bp=bp_));

	chr = input(chr_,1.);
	bp  = input(bp_,8.);
run;

%mend repeat;

Now you can use the code repeatedly in this way:

 

%repeat (path/file_1)

%repeat (path/file_2)

 

Notice that the changes are few ... there's encapsulating the statements with %macro and %mend, and changing the reference to the path to use doublequotes with a reference to the value you will supply:  "&path"

 

It's up to you to sort out how to access CHOLESTEROL_3 at the right time, to get the right version of the data set.

ChrisNZ
Tourmaline | Level 20

The reply from @Astounding gives you the answer.

Just a note that the three data steps could (should really) be just one.

monona
Obsidian | Level 7
Actually @Astounding 's solution works. Could you explain why should that be a single data step? I frankly have no idea how to make them one data step.
Astounding
PROC Star

Here's an example of a single DATA step that could replace three DATA steps:

 


data cholesterol;
  infile 'path/file_1'  delimiter='09'x TRUNCOVER DSD firstobs=2 ;
  length variant $70 minor_allele $2 minor_AF 8
         expected_case_minor_AC 8 low_confidence_variant $5 
         n_complete_samples 8 AC 8 ytx 8
         beta 8 se 8 tstat 8 pval 8
  ;
  input variant--ytx ( beta se tstat pval ) (:??32.) ;

	retain
	 chr bp mutant orig minor_allele minor_AF 
	 expected_case_minor_AC low_confidence_variant  
	 n_complete_samples AC ytx beta se tstat pval 
	 ;
chr_ = scan(variant, 1,':'); bp_ = scan(variant, 2,':'); mutant = scan(variant, 3,':'); orig = scan(variant, 4,':'); chr = input(chr_,1.); bp = input(bp_,8.); drop chr_ bp_; run;

 

VRKiwi
Obsidian | Level 7

Here is how I did it.

data CHOLESTEROL;
 * infile 'path/file_1'  delimiter='09'x TRUNCOVER DSD firstobs=2 ;
 
  %* set length and informat and output order;
  informat CHR BP 8.
           MUTANT ORIG $20. 
           MINOR_ALLELE $2. 
           MINOR_AF EXPECTED_CASE_MINOR_AC 8. 
           LOW_CONFIDENCE_VARIANT  $5.
           N_COMPLETE_SAMPLES AC YTX BETA SE TSTAT PVAL 8. 
           VARIANT $70. ;
           
  %* read data in input order;       
  input  VARIANT MINOR_ALLELE MINOR_AF 
         EXPECTED_CASE_MINOR_AC LOW_CONFIDENCE_VARIANT 
         N_COMPLETE_SAMPLES  AC YTX ( BETA SE TSTAT PVAL ) (:?? 32.) ; 
         
  drop VARIANT ; 
  
  CHR    = input(scan(VARIANT, 1,':'),?? 1.); 
  BP     = input(scan(VARIANT, 2,':'),?? 8.);
  MUTANT =       scan(VARIANT, 3,':')       ;
  ORIG   =       scan(VARIANT, 4,':')       ;
run;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 766 views
  • 3 likes
  • 4 in conversation