Solved: Reference a Sequence of Columns in PROC SQL

mariko5797 · Posted 08-02-2021 01:48 PM

If I want to use PROC SQL to join to datasets, what is a quick way to reference a sequence of columns?

That is, if dataset 1 has columns: ID, NAME, X1, X2, X3, X4, X5 and dataset 2 has columns: ID, NAME, Y1, Y2, Y3, Y4, Y5, then how can I join data1.* with columns Y1 to Y5 on ID and NAME using PROC SQL? Is there a way similar to data steps Y1--Y5? Or would I need to write out every column?

PaigeMiller · Posted 08-02-2021 01:52 PM

The Y1-Y5 syntax, and the Y1--Y5 syntax, do not work in PROC SQL. You have to write out all of the columns names, or (if possible) use something like data1.* and data2.*

--
Paige Miller

View solution in original post

PaigeMiller · Posted 08-02-2021 01:52 PM

The Y1-Y5 syntax, and the Y1--Y5 syntax, do not work in PROC SQL. You have to write out all of the columns names, or (if possible) use something like data1.* and data2.*

--
Paige Miller

PhilC · Posted 08-02-2021 03:36 PM

data1.* and data2.* does not work unless you rename the variables ID and Name in either dataset.

Using DICTIONARY.COLUMNS is an alternative, but it fails to be as satisfying as the syntax X1--X5.

data data1;
  ID=_N_;
  set SAShelp.Fish (Keep=Species Length: obs=3);
  rename 
    Species=Name
    Length1=X1
    Length2=X2
    Length3=X3
;
data data2;
  ID=_N_;
  set SAShelp.Fish (Keep=Species Length: firstobs=4 obs=6);
  rename 
    Species=Name
    Length1=Y1
    Length2=Y2
    Length3=Y3
;
run;

proc sql noprint;
  select cats(memname,".",name)
    into :data1_X1_X3 separated by ","
    from dictionary.columns
    where LIBNAME="WORK"
    and memname = "DATA1"
      and name like "X_";
  select cats(memname,".",name)
    into :data2_Y1_Y3 separated by ","
    from dictionary.columns
    where LIBNAME="WORK"
    and memname = "DATA2"
      and name like "Y_";
quit;
  %put &=data1_X1_X3 ;
  %put &=data2_Y1_Y3;

proc sql;
  select data1.ID, 
      data1.name, 
      &data1_X1_X3, 
      &data2_Y1_Y3
    from data1, data2;
quit;

tellmeaboutityo · Posted 08-02-2021 04:25 PM

When I have an excessively large list of variables, I copy the list (column-wise) into excel, create another column that is simply , and then concatenate the variable names with the comma. Then I copy the list with commas back into SAS. It's not clever, but it's simple, fast, and works.

jimbarbour · Posted 08-02-2021 06:39 PM

@mariko5797 ,

Lots of good ideas here.

Another way is the little quickie macro I have written below. See test code below that -- that you should be able to use on your system.

To use, you just need to call the macro, tell it the variable name, and specify how many numbered occurrences there are. Like so:

%Generate_Vars(Length, 3)

In the above example, my variable name is "Length", and there are three occurrences. The following will be generated:

Length1
,Length2
,Length3

Note that the first occurrence is not preceded with a comma. The macro will put a comma before all iterations except the first. It should work with either preceding or trailing commas. I prefer preceding commas, so the example is formatted that way, but either should work.

You can turn off the commas (if for some reason you needed to) by passing COMMA=NO as a third parameter.

Jim

Macro code:

%MACRO	Generate_Vars(Var, Count, Comma=YES);
	%LOCAL	Separator;
	%IF	%QUPCASE(&Comma)	=	YES	%THEN
		%LET	Separator	=	%STR(,);
	%DO	i					=	1	%TO	&Count;
		%IF	&i				=	1	%THEN
			%DO;
				&Var&i
			%END;
		%ELSE
			%DO;
				&Separator&Var&i
			%END;
	%END;
%MEND	Generate_Vars;

Test code (uses SAShelp Libname, so it should work in any system):

PROC	SQL;
	SELECT
		Height
		,%Generate_Vars(Length, 3)
		,Species
		,Weight
		,Width
		FROM	SASHELP.Fish;
QUIT;

mkeintz · Posted 08-02-2021 09:38 PM

@mariko5797 wrote:

If I want to use PROC SQL to join to datasets, what is a quick way to reference a sequence of columns?

That is, if dataset 1 has columns: ID, NAME, X1, X2, X3, X4, X5 and dataset 2 has columns: ID, NAME, Y1, Y2, Y3, Y4, Y5, then how can I join data1.* with columns Y1 to Y5 on ID and NAME using PROC SQL? Is there a way similar to data steps Y1--Y5? Or would I need to write out every column?

Am I missing something here? What is being overlooked in:

proc sql noprint;
  create table want as 
  select * from data1,data2 where data1.name=data2.name and data1.id=data2.id;
quit;

or if you want to keep missing right-side and missing left-side contributions:

proc sql noprint;
  create table want as 
  select * from data1 full join data2 on data1.name=data2.name and data1.id=data2.id;
quit;

Now if what you are really saying is that there are dozens of unwanted variables in data1 and/or data2, then yes SQL would require you to construction variable lists instead of the "select *". But PROC SQL is using sas datasets, and the sas dataset engine honors sas dataset name parameters like KEEP= and DROP=. which do provide some shorthand list syntax, as in:

proc sql noprint;
  create table tt as 
  select * from data1 (keep=id name x1-x3) 
               ,data2 (keep=id name y:)
   where data1.name=data2.name and data1.id=data2.id;
quit;

or

proc sql noprint;
  create table tt as 
  select * from data1 (keep=id name--x3) 
               ,data2 (keep=id name y:)
   where data1.name=data2.name and data1.id=data2.id;
quit;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

jimbarbour · Posted 08-03-2021 12:27 AM

Ah. Very good. Yes, just code SAS shorthands on the Drop and Keep.

I see you threw in three techniques, all good:

1. Single hyphen: x1 - x3 (all variables starting with, in this case, the letter "x" in sequence, starting with the number 1 through the last of the sequence as specified)

2. Double hyphen: name -- x3 (all variables, based on position, between and including name and x3)

3. Wildcard: y: (all variables starting with a "y")

My little macro could still be useful for explicit pass through queries, although, truth be told, it was just fun to write. 🙂

Jim

PhilC · Posted 08-03-2021 08:23 AM

Should we, or should we not request this in a SASware ballot?

proc sql;
  select data1.ID, 
      data1.name, 
      of data1.X1 - data1.X3,
      of data2.Y1 -- data2.Y3
    from data1, data2;
quit;

jimbarbour · Posted 08-03-2021 09:34 AM

@PhilC wrote:

Should we, or should we not request this in a SASware ballot?
proc sql;
  select data1.ID, 
      data1.name, 
      of data1.X1 - data1.X3,
      of data2.Y1 -- data2.Y3
    from data1, data2;
quit;

or maybe

proc sql;
  select data1.ID, 
      data1.name, 
      (of data1.X1 - data1.X3),
      (of data2.Y1 -- data2.Y3)
    from data1, data2;
quit;

Interesting either way. Not earth shattering, but that would definitely make coding easier. I say submit it.

Jim

Reference a Sequence of Columns in PROC SQL

Re: Reference a Sequence of Columns in PROC SQL

Re: Reference a Sequence of Columns in PROC SQL

Re: Reference a Sequence of Columns in PROC SQL

Re: Reference a Sequence of Columns in PROC SQL

Re: Reference a Sequence of Columns in PROC SQL

Re: Reference a Sequence of Columns in PROC SQL

Re: Reference a Sequence of Columns in PROC SQL

Re: Reference a Sequence of Columns in PROC SQL

Re: Reference a Sequence of Columns in PROC SQL

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away