BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
JibJam221
Obsidian | Level 7

Hi all, currently struggling with my code.

 

Goal of this code: To create multiple sub-datatsets from a master dataset, based on the variable "LOCATION". I would like to have SAS create 5 sub-datasets based on the location listed under "LOCATION". 

 

I would like all of the column variables from the master dataset to be placed into these sub-datasets, where employee status is = to FULL-TIME. Therefore, all employees located in CANADA who are FULL-TIME will be in their own dataset. All employees located in the US who are FULL-TIME will be in their own dataset. etc.

 

/*LOCATIONS*/
data Locations;
input locations $ 1-50;
datalines;
CANADA
US
MEXICO
JAPAN
FRANCE
;
run;

proc sql noprint;
select distinct Locations
into :Location1-
from work.Locations;
quit;

%macro l_locations();
%do i=1 %to 5; 

data loc_in; 
set Overall_Location; 
where "work/location"n="&&locations&i."; 
run;

proc sql; create table loc as select * 
from loc_in; 
quit;

data locations&i.; set loc; run;

%end;
%mend;

%l_locations;

any help is appreciated. I dont get any error/warnings, but I also get no output.

 

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

@JibJam221 wrote:
Hi Paige. Im trying to avoid the need to create multiple data statements as the list of locations is expanding (will be up to 40 locations). So I was hoping to create an efficient code that can split the data into sub-datasets without needing to write 40 data statements.

Would you have a sample or reference of the approach you mentioned?

If there is a valid need to make the separated datasets (for example they will be distributed to different users) then use the data to help you generate the code.  I find it much easier to debug SAS code than macro code. So just use a data step to write the code.  The same as you would use a data step to write any other text file.  Then you can use %INCLUDE to run the generated code.

 

So the "efficient" way to split a dataset is to use one data step multiple output datasets.

data canada mexico usa;
   set have;
   where status='COMPLETED';
   select (location);
      when ('CANADA') output canada;
      when ('MEXICO') output mexico;
      when ('USA') output usa;
   end;
run;

So first get a dataset with the list of the locations you need split out.  If the location value is not a valid dataset name then you need a method to convert the location value to a dataset name. 

data Locations;
  length location $50 dsname $32 ;
  input location $ 1-50;
  dsname=location ;
datalines;
CANADA
US
MEXICO
JAPAN
FRANCE
;

If you do not have the locations in advance then get it from the original data.  To improve performance of the SELECT statement order them by decreasing frequency; 

proc freq data=have order=freq;
  tables location / noprint out=locations ;
run;

Now use that dataset to generate the code;

filename code temp;
data _null_;
   file code ;
   put 'data ' @ ;
  do p=1 to nobs ;
     set locations point=p nobs=nobs;
     put dsname @;
  end;
  put ';'
   / '  set have;'
   / "  where status='COMPLETED';"
   / '  select (location);' 
  ;
  do p=1 to nobs;
     set location point=p;
     put 'when (' location :$quote. ') output ' dsname ';' ;
   end;
   put '  otherwise;' 
       / '  end;'
       / 'run;'
    ;
run;

Now look at the code and make sure your code generattion logic is right so that it is generating valid SAS code.  For example you could just replace the FILE CODE; statement with FILE LOG: statement and then check the lines of code in the SAS log.

 

Once you are sure the code generation is making proper SAS code then use %INCLUDE to run it.

%include code / source2;

View solution in original post

10 REPLIES 10
Astounding
PROC Star

Don't know if what you posted is what you actually ran, but this would be a typo:

into :Location1-

Your later reference to these macro variables adds an "s":  "&&locations&i" which would not be a match.

JibJam221
Obsidian | Level 7
thanks for the quick response. I just fixed that reference however still no luck, unfortunately. Not sure if the approach im taking with this macro is correct.
PaigeMiller
Diamond | Level 26

@Astounding wrote:

Don't know if what you posted is what you actually ran, but this would be a typo:

into :Location1-

I'm pretty sure that is correct syntax.

 

To @JibJam221 the idea of writing macros to split data sets into smaller components by location is generally an inefficient choice. SAS has many features that allow you to work on just one location, if that's what you want to do (this is the WHERE statement); or to do analysis or reporting so you write on piece of code that does the analysis for just Canada, and continues on to do the same analysis for US, and so on (this the BY statement).

 

So without further justification, I think splitting the data set as you described is not necessary.

--
Paige Miller
JibJam221
Obsidian | Level 7
Hi Paige. Im trying to avoid the need to create multiple data statements as the list of locations is expanding (will be up to 40 locations). So I was hoping to create an efficient code that can split the data into sub-datasets without needing to write 40 data statements.

Would you have a sample or reference of the approach you mentioned?
PaigeMiller
Diamond | Level 26

Example 1:

 

proc means data=have noprint;
    class location;
    var x1-x11;
    output out=_stats_ mean=/autoname;
run;

 

The CLASS statement here computes the statistics for ALL locations.

 

Example 2:

 

proc logistic data=have;
    by location;
    model y = x1 x2;
run;

 

The BY statement allows you to have logistic regressions computed for every location.


As you can see in Example 1 and Example 2, no macros are needed.

 

Example 3:

 

proc freq data=have;
    where location='Canada';
    tables c1;
run;

 

The WHERE statement allows you to do an analysis with just the Canada data without splitting the data set up as you originally requested.

--
Paige Miller
PaigeMiller
Diamond | Level 26

@JibJam221 wrote:
Hi Paige. Im trying to avoid the need to create multiple data statements as the list of locations is expanding (will be up to 40 locations). So I was hoping to create an efficient code that can split the data into sub-datasets without needing to write 40 data statements.

But this still doesn't explain what you are going to do with this data after you split it up. What is the next step? What is the analysis or report or output that you are going to create from these split up data sets?

--
Paige Miller
Tom
Super User Tom
Super User

@JibJam221 wrote:
Hi Paige. Im trying to avoid the need to create multiple data statements as the list of locations is expanding (will be up to 40 locations). So I was hoping to create an efficient code that can split the data into sub-datasets without needing to write 40 data statements.

Would you have a sample or reference of the approach you mentioned?

If there is a valid need to make the separated datasets (for example they will be distributed to different users) then use the data to help you generate the code.  I find it much easier to debug SAS code than macro code. So just use a data step to write the code.  The same as you would use a data step to write any other text file.  Then you can use %INCLUDE to run the generated code.

 

So the "efficient" way to split a dataset is to use one data step multiple output datasets.

data canada mexico usa;
   set have;
   where status='COMPLETED';
   select (location);
      when ('CANADA') output canada;
      when ('MEXICO') output mexico;
      when ('USA') output usa;
   end;
run;

So first get a dataset with the list of the locations you need split out.  If the location value is not a valid dataset name then you need a method to convert the location value to a dataset name. 

data Locations;
  length location $50 dsname $32 ;
  input location $ 1-50;
  dsname=location ;
datalines;
CANADA
US
MEXICO
JAPAN
FRANCE
;

If you do not have the locations in advance then get it from the original data.  To improve performance of the SELECT statement order them by decreasing frequency; 

proc freq data=have order=freq;
  tables location / noprint out=locations ;
run;

Now use that dataset to generate the code;

filename code temp;
data _null_;
   file code ;
   put 'data ' @ ;
  do p=1 to nobs ;
     set locations point=p nobs=nobs;
     put dsname @;
  end;
  put ';'
   / '  set have;'
   / "  where status='COMPLETED';"
   / '  select (location);' 
  ;
  do p=1 to nobs;
     set location point=p;
     put 'when (' location :$quote. ') output ' dsname ';' ;
   end;
   put '  otherwise;' 
       / '  end;'
       / 'run;'
    ;
run;

Now look at the code and make sure your code generattion logic is right so that it is generating valid SAS code.  For example you could just replace the FILE CODE; statement with FILE LOG: statement and then check the lines of code in the SAS log.

 

Once you are sure the code generation is making proper SAS code then use %INCLUDE to run it.

%include code / source2;
JibJam221
Obsidian | Level 7
this is exactly what I need. The datasets are being sent to their respective locations, and cannot contain information regarding other locations (due to data confidentiality).

Thanks so much
Tom
Super User Tom
Super User

@JibJam221 wrote:
this is exactly what I need. The datasets are being sent to their respective locations, and cannot contain information regarding other locations (due to data confidentiality).

Thanks so much

In that case it might be easier to send them CSV files (as long as none of the strings in the data contain end of line characters).

You can do that directly use a single data step without a need to find the location names.  And filenames will not have the limitations on naming that you would have with a dataset name.

 

It is especially trivial if the files do not have to have header lines.  Then you can write the file from the data without even having to sort by location first.

data _null_;
  length filename $256;
  set have;
  filename=cats('/home/me/outdir/',location,'.csv');
  file dummy dsd filevar=filename mod;
  put (_all_) (+0);
run;

 

Reeza
Super User

Is the locations data set a custom list or do you want to split by all values in the data set?

 

Either are easily accommodated.

I would echo what others have said, there's rarely a reason to do this in SAS though. I don't see how you're filtering for FULL TIME in the code below. 

 

	proc sort data=overall_location;
	by 'work/location'n;
	run;

	data _null_;
	set overall_location;
	by 'work/location'n;
	retain count;

	if first.'work/location'n then do;
		count+1;

		*create string;
		str1='data locations'||put(count,z3.)||"; set overall_location; where  'work/location'n="||'work/location'n||";run;";
		call execute(str1);

	end;
	run; 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 623 views
  • 0 likes
  • 5 in conversation