Solved: SAS Macro programming

JibJam221 · Posted 09-15-2022 11:25 AM

Hi all, currently struggling with my code.

Goal of this code: To create multiple sub-datatsets from a master dataset, based on the variable "LOCATION". I would like to have SAS create 5 sub-datasets based on the location listed under "LOCATION".

I would like all of the column variables from the master dataset to be placed into these sub-datasets, where employee status is = to FULL-TIME. Therefore, all employees located in CANADA who are FULL-TIME will be in their own dataset. All employees located in the US who are FULL-TIME will be in their own dataset. etc.

/*LOCATIONS*/
data Locations;
input locations $ 1-50;
datalines;
CANADA
US
MEXICO
JAPAN
FRANCE
;
run;

proc sql noprint;
select distinct Locations
into :Location1-
from work.Locations;
quit;

%macro l_locations();
%do i=1 %to 5; 

data loc_in; 
set Overall_Location; 
where "work/location"n="&&locations&i."; 
run;

proc sql; create table loc as select * 
from loc_in; 
quit;

data locations&i.; set loc; run;

%end;
%mend;

%l_locations;

any help is appreciated. I dont get any error/warnings, but I also get no output.

Thanks

Tom · Posted 09-15-2022 01:29 PM

@JibJam221 wrote:
Hi Paige. Im trying to avoid the need to create multiple data statements as the list of locations is expanding (will be up to 40 locations). So I was hoping to create an efficient code that can split the data into sub-datasets without needing to write 40 data statements.

Would you have a sample or reference of the approach you mentioned?

If there is a valid need to make the separated datasets (for example they will be distributed to different users) then use the data to help you generate the code. I find it much easier to debug SAS code than macro code. So just use a data step to write the code. The same as you would use a data step to write any other text file. Then you can use %INCLUDE to run the generated code.

So the "efficient" way to split a dataset is to use one data step multiple output datasets.

data canada mexico usa;
   set have;
   where status='COMPLETED';
   select (location);
      when ('CANADA') output canada;
      when ('MEXICO') output mexico;
      when ('USA') output usa;
   end;
run;

So first get a dataset with the list of the locations you need split out. If the location value is not a valid dataset name then you need a method to convert the location value to a dataset name.

data Locations;
  length location $50 dsname $32 ;
  input location $ 1-50;
  dsname=location ;
datalines;
CANADA
US
MEXICO
JAPAN
FRANCE
;

If you do not have the locations in advance then get it from the original data. To improve performance of the SELECT statement order them by decreasing frequency;

proc freq data=have order=freq;
  tables location / noprint out=locations ;
run;

Now use that dataset to generate the code;

filename code temp;
data _null_;
   file code ;
   put 'data ' @ ;
  do p=1 to nobs ;
     set locations point=p nobs=nobs;
     put dsname @;
  end;
  put ';'
   / '  set have;'
   / "  where status='COMPLETED';"
   / '  select (location);' 
  ;
  do p=1 to nobs;
     set location point=p;
     put 'when (' location :$quote. ') output ' dsname ';' ;
   end;
   put '  otherwise;' 
       / '  end;'
       / 'run;'
    ;
run;

Now look at the code and make sure your code generattion logic is right so that it is generating valid SAS code. For example you could just replace the FILE CODE; statement with FILE LOG: statement and then check the lines of code in the SAS log.

Once you are sure the code generation is making proper SAS code then use %INCLUDE to run it.

%include code / source2;

View solution in original post

Astounding · Posted 09-15-2022 11:42 AM

Don't know if what you posted is what you actually ran, but this would be a typo:

into :Location1-

Your later reference to these macro variables adds an "s": "&&locations&i" which would not be a match.

JibJam221 · Posted 09-15-2022 11:51 AM

thanks for the quick response. I just fixed that reference however still no luck, unfortunately. Not sure if the approach im taking with this macro is correct.

PaigeMiller · Posted 09-15-2022 11:52 AM

@Astounding wrote:

Don't know if what you posted is what you actually ran, but this would be a typo:
into :Location1-

I'm pretty sure that is correct syntax.

To @JibJam221 the idea of writing macros to split data sets into smaller components by location is generally an inefficient choice. SAS has many features that allow you to work on just one location, if that's what you want to do (this is the WHERE statement); or to do analysis or reporting so you write on piece of code that does the analysis for just Canada, and continues on to do the same analysis for US, and so on (this the BY statement).

So without further justification, I think splitting the data set as you described is not necessary.

--
Paige Miller

JibJam221 · Posted 09-15-2022 11:54 AM

Hi Paige. Im trying to avoid the need to create multiple data statements as the list of locations is expanding (will be up to 40 locations). So I was hoping to create an efficient code that can split the data into sub-datasets without needing to write 40 data statements.

Would you have a sample or reference of the approach you mentioned?

PaigeMiller · Posted 09-15-2022 01:13 PM

Example 1:

proc means data=have noprint;
    class location;
    var x1-x11;
    output out=_stats_ mean=/autoname;
run;

The CLASS statement here computes the statistics for ALL locations.

Example 2:

proc logistic data=have;
    by location;
    model y = x1 x2;
run;

The BY statement allows you to have logistic regressions computed for every location.

As you can see in Example 1 and Example 2, no macros are needed.

Example 3:

proc freq data=have;
    where location='Canada';
    tables c1;
run;

The WHERE statement allows you to do an analysis with just the Canada data without splitting the data set up as you originally requested.

--
Paige Miller

PaigeMiller · Posted 09-15-2022 01:19 PM

@JibJam221 wrote:
Hi Paige. Im trying to avoid the need to create multiple data statements as the list of locations is expanding (will be up to 40 locations). So I was hoping to create an efficient code that can split the data into sub-datasets without needing to write 40 data statements.

But this still doesn't explain what you are going to do with this data after you split it up. What is the next step? What is the analysis or report or output that you are going to create from these split up data sets?

--
Paige Miller

Tom · Posted 09-15-2022 01:29 PM

@JibJam221 wrote:
Hi Paige. Im trying to avoid the need to create multiple data statements as the list of locations is expanding (will be up to 40 locations). So I was hoping to create an efficient code that can split the data into sub-datasets without needing to write 40 data statements.

Would you have a sample or reference of the approach you mentioned?

If there is a valid need to make the separated datasets (for example they will be distributed to different users) then use the data to help you generate the code. I find it much easier to debug SAS code than macro code. So just use a data step to write the code. The same as you would use a data step to write any other text file. Then you can use %INCLUDE to run the generated code.

So the "efficient" way to split a dataset is to use one data step multiple output datasets.

data canada mexico usa;
   set have;
   where status='COMPLETED';
   select (location);
      when ('CANADA') output canada;
      when ('MEXICO') output mexico;
      when ('USA') output usa;
   end;
run;

So first get a dataset with the list of the locations you need split out. If the location value is not a valid dataset name then you need a method to convert the location value to a dataset name.

data Locations;
  length location $50 dsname $32 ;
  input location $ 1-50;
  dsname=location ;
datalines;
CANADA
US
MEXICO
JAPAN
FRANCE
;

If you do not have the locations in advance then get it from the original data. To improve performance of the SELECT statement order them by decreasing frequency;

proc freq data=have order=freq;
  tables location / noprint out=locations ;
run;

Now use that dataset to generate the code;

filename code temp;
data _null_;
   file code ;
   put 'data ' @ ;
  do p=1 to nobs ;
     set locations point=p nobs=nobs;
     put dsname @;
  end;
  put ';'
   / '  set have;'
   / "  where status='COMPLETED';"
   / '  select (location);' 
  ;
  do p=1 to nobs;
     set location point=p;
     put 'when (' location :$quote. ') output ' dsname ';' ;
   end;
   put '  otherwise;' 
       / '  end;'
       / 'run;'
    ;
run;

Now look at the code and make sure your code generattion logic is right so that it is generating valid SAS code. For example you could just replace the FILE CODE; statement with FILE LOG: statement and then check the lines of code in the SAS log.

Once you are sure the code generation is making proper SAS code then use %INCLUDE to run it.

%include code / source2;

JibJam221 · Posted 09-15-2022 01:48 PM

this is exactly what I need. The datasets are being sent to their respective locations, and cannot contain information regarding other locations (due to data confidentiality).

Thanks so much

Tom · Posted 09-15-2022 02:27 PM

@JibJam221 wrote:
this is exactly what I need. The datasets are being sent to their respective locations, and cannot contain information regarding other locations (due to data confidentiality).

Thanks so much

In that case it might be easier to send them CSV files (as long as none of the strings in the data contain end of line characters).

You can do that directly use a single data step without a need to find the location names. And filenames will not have the limitations on naming that you would have with a dataset name.

It is especially trivial if the files do not have to have header lines. Then you can write the file from the data without even having to sort by location first.

data _null_;
  length filename $256;
  set have;
  filename=cats('/home/me/outdir/',location,'.csv');
  file dummy dsd filevar=filename mod;
  put (_all_) (+0);
run;

Reeza · Posted 09-15-2022 01:55 PM

Is the locations data set a custom list or do you want to split by all values in the data set?

Either are easily accommodated.

I would echo what others have said, there's rarely a reason to do this in SAS though. I don't see how you're filtering for FULL TIME in the code below.

	proc sort data=overall_location;
	by 'work/location'n;
	run;

	data _null_;
	set overall_location;
	by 'work/location'n;
	retain count;

	if first.'work/location'n then do;
		count+1;

		*create string;
		str1='data locations'||put(count,z3.)||"; set overall_location; where  'work/location'n="||'work/location'n||";run;";
		call execute(str1);

	end;
	run;

SAS Macro programming

Re: SAS Macro programming

Re: SAS Macro programming

Re: SAS Macro programming

Re: SAS Macro programming

Re: SAS Macro programming

Re: SAS Macro programming

Re: SAS Macro programming

Re: SAS Macro programming

Re: SAS Macro programming

Re: SAS Macro programming

Re: SAS Macro programming

Classroom Training Available!