BookmarkSubscribeRSS Feed
PietariKoskela
SAS Employee

*AI was used as much as possible to create this tip!

A few years ago, I was at a restaurant and overheard a conversation about splitting a massive Excel file into smaller ones based on unique values in column X. They estimated it would take 10 weeks and likely lead to a mental breakdown from copying and pasting thousands of rows. Clearly, they weren't SAS users!

 

After experimenting with various AI tools and facing several unfulfilled requests, I found a straightforward solution. It may not be the most efficient, but it serves as a great generic example. Here's the code, the best documentation I've (AI) ever written, and a bit about the process.

 

/**
 * Macro to split a dataset into multiple datasets based on unique values of a specified column.
 *
 * Parameters:
 * - DSN: The name of the input dataset (default is sashelp.class).
 * - Column: The column based on which the dataset will be split.
 * - Prefix: The prefix for the names of the output datasets.
 *
 * Usage:
 * %split_dataset(DSN=sashelp.class, Column=SEX, Prefix=SEXSplit_);
 */

%macro split_dataset(DSN=sashelp.class, Column=SEX, Prefix=Split_);
    /* Get unique values from the specified column */
    proc sql noprint;
        select distinct &Column into :values separated by ' ' 
        from &DSN;
    quit;

    /* Use proc contents to get the column type */
    proc contents data=&DSN out=colinfo(keep=name type) noprint;
    run;

    data _null_;
        set colinfo;
        if upcase(name) = upcase("&Column") then do;
            if type = 2 then call symput('coltype', 'char');
            else if type = 1 then call symput('coltype', 'num');
        end;
    run;

    /* Testing - Print the column type */
    /* %put Column type: &coltype; */

    /* Loop through each unique value and create a new dataset */
    %let i = 1;
    %do %while (%scan(&values, &i) ne );
        %let value = %scan(&values, &i);
        %put Processing value: &value;
        %if &coltype = char %then %do;
            data &Prefix.&value;
                set &DSN;
                where &Column = "&value";
            run;
        %end;
        %else %do;
            data &Prefix.&value;
                set &DSN;
                where &Column = &value;
            run;
        %end;
        %let i = %eval(&i + 1);
    %end;
%mend split_dataset;

/* Example usage */
%split_dataset(DSN=sashelp.class, Column=Age, Prefix=AgeSplit_);

Here is the first prompt where it all started:

Screenshot 2024-12-19 204229.png

Next phase is feedback the log to AI until it works, or ask for new features or ask it to explain some parts of the code to you. Some prompt examples I tried on the way.

Screenshot 2024-12-19 205118.pngScreenshot 2024-12-19 205138.png

One interesting point was to request complexity analysis for the algorithm in the code. This is not very often done and if development is done with small sample data, this could be very useful! Earlier I have been positively surprised about some OS level limitations info and some security analyses AI does.

Screenshot 2024-12-19 205649.png 

After the macro was done I decided to make it a custom step in SAS Studio and it was super easy with this helpful guide: https://www.youtube.com/watch?v=yKdCwNLEhUI 

CustomStep.png

1 REPLY 1
PietariKoskela
SAS Employee

SAS can write to multiple outputs in a single data pass. I tried this, and with the help of AI, I got it working. However, there might be OS-level restrictions on open file handles. Depending on your configuration, a process is usually limited to a certain number of open file handles. If the unique values in your splitting column exceed this limit, you could run into problems. But if you have a limited number of unique values and a large dataset to split, this method should work well for you.

 

For a production-ready solution, the next steps could include adding parameter error handling, output name length handling and maybe offering different output options for the user (Excel?).

 

If you want to test custom steps after watching that helpful YouTube video, here are the codes for that:

/*Prompt UI*/
{
	"showPageContentOnly": true,
	"pages": [
		{
			"id": "page1",
			"type": "page",
			"label": "Page 1",
			"children": [
				{
					"id": "DSN",
					"type": "inputtable",
					"label": "Input DataSet Name",
					"required": true,
					"placeholder": "",
					"visible": ""
				},
				{
					"id": "Column",
					"type": "columnselector",
					"label": "Split my data with one column",
					"include": null,
					"order": false,
					"columntype": "a",
					"max": null,
					"min": null,
					"visible": "",
					"table": "DSN"
				},
				{
					"id": "text1",
					"type": "text",
					"text": "This is informational text.",
					"visible": ""
				},
				{
					"id": "Prefix",
					"type": "textarea",
					"label": "Prefix for the names of the output datasets",
					"placeholder": "",
					"required": true,
					"visible": ""
				}
			]
		}
	],
	"values": {
		"DSN": {
			"library": "sashelp",
			"table": "class"
		},
		"Column": [],
		"Prefix": "Split_"
	}
}
/*Program*/
/**
 * Macro to split a dataset into multiple datasets based on unique values of a specified column.
 *
 * Parameters:
 * - DSN: The name of the input dataset (default is sashelp.class).
 * - Column: The column based on which the dataset will be split.
 * - Prefix: The prefix for the names of the output datasets.
 *
 * Usage:
 * %split_dataset(DSN=sashelp.class, Column=SEX, Prefix=SEXSplit_);
 */

%macro split_dataset(DSN=sashelp.class, Column=SEX, Prefix=Split_);
    /* Get unique values from the specified column */
    proc sql noprint;
        select distinct &Column into :values separated by ' ' 
        from &DSN;
    quit;

    /* Use proc contents to get the column type */
    proc contents data=&DSN out=colinfo(keep=name type) noprint;
    run;

    data _null_;
        set colinfo;
        if upcase(name) = upcase("&Column") then do;
            if type = 2 then call symput('coltype', 'char');
            else if type = 1 then call symput('coltype', 'num');
        end;
    run;

    /* Loop through each unique value and create a new dataset */
    %let i = 1;
    %do %while (%scan(&values, &i) ne );
        %let value = %scan(&values, &i);
        %put Processing value: &value;
        %if &coltype = char %then %do;
            data &Prefix.&value;
                set &DSN;
                where &Column = "&value";
            run;
        %end;
        %else %do;
            data &Prefix.&value;
                set &DSN;
                where &Column = &value;
            run;
        %end;
        %let i = %eval(&i + 1);
    %end;
%mend split_dataset;

/* Call macro */
%split_dataset(DSN=&DSN, Column=&Column, Prefix=&Prefix);

 

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

Discussion stats
  • 1 reply
  • 134 views
  • 4 likes
  • 1 in conversation