Solved: Re: How to Group variables with same suffix and different prefix

JMagenta · Posted 11-19-2023 11:57 AM

Hello,

I have a problem. I have 60 variables:

Apples1--Apples20

Carrots1--Carrots20

Oranges1--Oranges20

What I want to do is group variables with the same suffix together in the SAS dataset:

Apples1 Carrots1 Oranges1 ... Apples20 Carrots20 Oranges20

How can I do this?

Next, depending on the value of Apples1--Apples20 ( using formula Min(Apples1--Apples20) to find the minimum value, I would like to create new variables with their values.

For instance:

Apples2 is less than all in group Apples1--Apples20, so :

Earlyfruit=Apples2;

Afruit=Carrot2;

Bfruit=Orange2;

Is this possible? Can I use "proc sql" to accomplish this somehow?

Any advice would be helpful.

Thank you,

J Magenta

Reeza · Posted 11-20-2023 10:54 AM

data have2;
retain &varlist;
set have;
Array Apples_ {*} apples1-apples20;
Array Carrot_{*} Carrots1-carrots20;
Array oranges_{*} oranges1-oranges20;

Earlyfruit=min(of apples_[*]);

index_earlyfruit = whichn(earlyfruit, of apples_(*));
afruit = carrot_(index_earlyfruit);
bfruit = oranges_(index_earlyfruit);
run;

Use WHICHN to find the index of the minimum value. Note that if you have ties, the first one is returned.

@JMagenta wrote:

Awesome!

Thank you so much!

But for the last part, where I create new variables, it gets tricky.

Could this work?

data have2;
retain &varlist;
set have;
Array Apples_ {*} apples1-apples20;
Array Carrot_{*} Carrots1-carrots20;
Array oranges_{*} oranges1-oranges20;
do i=1 to dim(apples_);
Earlyfruit=min(of apples_[*]);
Afruit=Carrot_[min(of apples_[*])]; /* I want the carrot that is chosen to match the suffix of Apple without changing the value of Carrot*/
Bfruit=Oranges_[min(of apples_[*])]; /* same here with Oranges*/
end;
run;

View solution in original post

Patrick · Posted 11-19-2023 10:40 PM

Here you go.

I assume some of the SAS syntax used will be new to you. I suggest you first consult the SAS docu and then ask questions for the bits that remain unclear.

/***
I have 60 variables:
Apples1--Apples20 
Carrots1--Carrots20
Oranges1--Oranges20
**/
data work.have;
  array Apples  {20} 8;
  array Carrots {20} 8;
  array Oranges {20} 8;
  do i=1 to dim(apples);
    apples[i]=mod(i*10,dim(apples))+1;
    carrots[i]=apples[i]*5;
    oranges[i]=apples[i]*8;
  end;
  drop i;
run;

/* 1. What I want to do is group variables with the same suffix together in the SAS dataset */
/* Comment: The order of variables in a table is of no real relevance. Such ordering requests are normally reporting */ 
/*          requirements to be addressed as part of report generation                                                */

/* sort selected variables by digits in the variable name and populate a macro variable with the result */
proc sql noprint;
  select name into :varlist separated by ' '
  from dictionary.columns
  where 
    libname='WORK' 
    and memname='HAVE'
    and (
             upcase(name) like 'APPLES%'
          or upcase(name) like 'CARROTS%'
          or upcase(name) like 'ORANGES%'
        )
  order by compress(name,,'kd'), name
  ;
quit;

/* use macro variable to print data */
proc print data=work.have;
  var &varlist;
run;

/* 2. Next, depending on the value of Apples1--Apples20 ( using formula Min(Apples1--Apples20) */
/*    to find the minimum value, I would like to create new variables with their values.       */

/* below 3 coding options for doing this.                                                                                     */
/* - option one - of Apples: - will include all variables where the name starts with Apple - would also include AppleComputer */
data want;
  set have;
  array _oranges{*} Oranges1 - Oranges20;
  min_apple=min(of Apples:);
  min_carrots=min(of Carrots1 - Carrots20);
  min_oranges=min(of _oranges[*]);
run;

proc print data=want;
  var min_:;
run;

Here the chatGPT explanation of above code which isn't too bad

Spoiler

Creating the Initial Dataset (work.have😞

sas

data work.have; array Apples {20} 8; array Carrots {20} 8; array Oranges {20} 8; do i=1 to dim(apples); apples[i]=mod(i*10,dim(apples))+1; carrots[i]=apples[i]*5; oranges[i]=apples[i]*8; end; drop i; run;
- This part of the code creates a dataset named work.have with 60 variables: Apples1 through Apples20, Carrots1 through Carrots20, and Oranges1 through Oranges20.
- Three arrays (Apples, Carrots, and Oranges) of size 20 are defined.
- A loop is used to populate these arrays with values based on a formula (apples[i]=mod(i*10,dim(apples))+1) and similar formulas for carrots and oranges.
- The drop i; statement removes the loop index variable i.
Sorting Variables and Creating a Macro Variable (varlist😞

sas

proc sql noprint; select name into :varlist separated by ' ' from dictionary.columns where libname='WORK' and memname='HAVE' and ( upcase(name) like 'APPLES%' or upcase(name) like 'CARROTS%' or upcase(name) like 'ORANGES%' ) order by compress(name,,'kd'), name ; quit;
- This part of the code uses PROC SQL to query the dictionary.columns table to get the variable names in the work.have dataset that meet certain conditions.
- The conditions are specified in the where clause, filtering for variables starting with 'APPLES', 'CARROTS', or 'ORANGES'.
- The result is sorted by the numerical part of the variable names (order by compress(name,,'kd'), name).
- The selected variable names are stored in a macro variable named varlist, separated by spaces.
Printing Selected Variables Using Macro Variable:

sas

proc print data=work.have; var &varlist; run;
- This part of the code prints the work.have dataset, selecting the variables specified in the macro variable varlist.
- The &varlist is resolved at runtime, and it includes the list of variables retrieved and sorted in the previous step.
Calculating Minimum Values and Creating a New Dataset (want😞

sas

data want; set have; array _oranges{*} Oranges1 - Oranges20; min_apple=min(of Apples:); min_carrots=min(of Carrots1 - Carrots20); min_oranges=min(of _oranges[*]); run;
- This part of the code creates a new dataset named want by using the set statement to copy the observations from the have dataset.
- Three new variables (min_apple, min_carrots, and min_oranges) are calculated using the min function and the of operator.
- For min_apple, it calculates the minimum value of all variables starting with 'Apples'.
- For min_carrots, it calculates the minimum value of variables from Carrots1 to Carrots20.
- For min_oranges, it calculates the minimum value of the array _oranges that includes Oranges1 through Oranges20.
Printing the New Dataset (want😞

sas

proc print data=want; var min_:; run;
- This part of the code prints the want dataset, selecting the variables that start with 'min_'.

In summary, the code generates a dataset with 60 variables, sorts and selects specific variables based on their names, calculates the minimum values for selected variable groups, and prints the results.

Creating the Initial Dataset (work.have sasCopy code data work.have; array Apples {20} 8; array Carrots {20} 8; array Oranges {20} 8; do i=1 to dim(apples); apples[i]=mod(i*10,dim(apples))+1; carrots[i]=apples[i]*5; oranges[i]=apples[i]*8; end; drop i; run; This part of the code creates a dataset named work.have with 60 variables: Apples1 through Apples20, Carrots1 through Carrots20, and Oranges1 through Oranges20. Three arrays (Apples, Carrots, and Oranges) of size 20 are defined. A loop is used to populate these arrays with values based on a formula (apples[i]=mod(i*10,dim(apples))+1) and similar formulas for carrots and oranges. The drop i; statement removes the loop index variable i. Sorting Variables and Creating a Macro Variable (varlist sasCopy code proc sql noprint; select name into :varlist separated by ' ' from dictionary.columns where libname='WORK' and memname='HAVE' and ( upcase(name) like 'APPLES%' or upcase(name) like 'CARROTS%' or upcase(name) like 'ORANGES%' ) order by compress(name,,'kd'), name ; quit; This part of the code uses PROC SQL to query the dictionary.columns table to get the variable names in the work.have dataset that meet certain conditions. The conditions are specified in the where clause, filtering for variables starting with 'APPLES', 'CARROTS', or 'ORANGES'. The result is sorted by the numerical part of the variable names (order by compress(name,,'kd'), name). The selected variable names are stored in a macro variable named varlist, separated by spaces. Printing Selected Variables Using Macro Variable: sasCopy code proc print data=work.have; var &varlist; run; This part of the code prints the work.have dataset, selecting the variables specified in the macro variable varlist. The &varlist is resolved at runtime, and it includes the list of variables retrieved and sorted in the previous step. Calculating Minimum Values and Creating a New Dataset (want sasCopy code data want; set have; array _oranges{*} Oranges1 - Oranges20; min_apple=min(of Apples:); min_carrots=min(of Carrots1 - Carrots20); min_oranges=min(of _oranges[*]); run; This part of the code creates a new dataset named want by using the set statement to copy the observations from the have dataset. Three new variables (min_apple, min_carrots, and min_oranges) are calculated using the min function and the of operator. For min_apple, it calculates the minimum value of all variables starting with 'Apples'. For min_carrots, it calculates the minimum value of variables from Carrots1 to Carrots20. For min_oranges, it calculates the minimum value of the array _oranges that includes Oranges1 through Oranges20. Printing the New Dataset (want sasCopy code proc print data=want; var min_:; run; This part of the code prints the want dataset, selecting the variables that start with 'min_'. In summary, the code generates a dataset with 60 variables, sorts and selects specific variables based on their names, calculates the minimum values for selected variable groups, and prints the results.

JMagenta · Posted 11-20-2023 08:40 AM

Awesome!

Thank you so much!

But for the last part, where I create new variables, it gets tricky.

Could this work?

data have2;
retain &varlist;
set have;
Array Apples_ {*} apples1-apples20;
Array Carrot_{*} Carrots1-carrots20;
Array oranges_{*} oranges1-oranges20;
do i=1 to dim(apples_);
Earlyfruit=min(of apples_[*]);
Afruit=Carrot_[min(of apples_[*])]; /* I want the carrot that is chosen to match the suffix of Apple without changing the value of Carrot*/
Bfruit=Oranges_[min(of apples_[*])]; /* same here with Oranges*/
end;
run;

Reeza · Posted 11-20-2023 10:54 AM

data have2;
retain &varlist;
set have;
Array Apples_ {*} apples1-apples20;
Array Carrot_{*} Carrots1-carrots20;
Array oranges_{*} oranges1-oranges20;

Earlyfruit=min(of apples_[*]);

index_earlyfruit = whichn(earlyfruit, of apples_(*));
afruit = carrot_(index_earlyfruit);
bfruit = oranges_(index_earlyfruit);
run;

Use WHICHN to find the index of the minimum value. Note that if you have ties, the first one is returned.

@JMagenta wrote:

Awesome!

Thank you so much!

But for the last part, where I create new variables, it gets tricky.

Could this work?

data have2;
retain &varlist;
set have;
Array Apples_ {*} apples1-apples20;
Array Carrot_{*} Carrots1-carrots20;
Array oranges_{*} oranges1-oranges20;
do i=1 to dim(apples_);
Earlyfruit=min(of apples_[*]);
Afruit=Carrot_[min(of apples_[*])]; /* I want the carrot that is chosen to match the suffix of Apple without changing the value of Carrot*/
Bfruit=Oranges_[min(of apples_[*])]; /* same here with Oranges*/
end;
run;

JMagenta · Posted 11-20-2023 11:12 AM

To Everyone on this project!!

THANK YOU!!!

I am learning day by day, and you are each helping me do that!

JMagenta

LinusH · Posted 11-20-2023 09:37 AM

This table structure may serve a purpose for reporting or specific analysis use cases, but it's not very practical for data storage.

It would be better to store your data in a more flexible (normalised) manner, and then create the desired structure when needed.

Also, I'm missing some kind id variable in your example...

I'm suggesting something like

id date/period fruit fruit_no fruit_value

1 jan2023 orange 1 45

Then you can use formats or other means to group you values, and other kind of aggregation. and then PROC TRANSPOSE to create your target layout.

Data never sleeps

Quentin · Posted 11-20-2023 10:39 AM

I agree with the suggestion to transpose this data into a more vertical / normalized format to make it easier to work with.

@JMagenta , could you post some example data? Ideally, a DATA step that uses the CARDS statement with just a few rows and a few variables, maybe ID Apple1-Apple3 Carrot1-Carrot3 Orange1-Orange3? Also please show the output data you would want for this example data.

The Boston Area SAS Users Group is hosting free webinars!
Next webinar will be in March 2025. Until then, check out our archives: https://www.basug.org/videos. And be sure to subscribe to our our email list.

JMagenta · Posted 11-20-2023 10:54 AM



data work.have;
  array Apples  {20} 8;
  array Carrots {20} 8;
  array Oranges {20} 8;
  do i=1 to dim(apples);
    apples[i]=i+1; /* I did "i+5" and so forth so that the numbers were unique values*/
    carrots[i]=i+5;
    oranges[i]=i+8;
  end;
  drop i;
run;


proc sql noprint;
  select name into :varlist separated by ' '
  from dictionary.columns
  where 
    libname='WORK' 
    and memname='HAVE'
    and (
             upcase(name) like 'APPLES%'
          or upcase(name) like 'CARROTS%'
          or upcase(name) like 'ORANGES%'
        )
  order by compress(name,,'kd'), name
  ;
quit;
data have2;
  retain &varlist;
	set have;
Array Apples_ {*} apples1-apples20;
Array Carrot_{*} Carrots1-carrots20;
Array oranges_{*} oranges1-oranges20;
 do i=1 to dim(apples_);
 	Earlyfruit=min(of apples_[*]);
	Afruit=Carrot_[min(of apples_[*])];
	Bfruit=Oranges_{min(of apples_[*])};

	end;

run;

The result value for Easyfruit is correct and equals Apples1 which is 2.

The result for Afruit should equal Carrots1 which is 6 instead it equals 7 (likely the value of Apples6).

The result for Bfruit should equal Oranges 1 which is 9 instead it equals 10 (likely the value of Apples9).

Kurt_Bremser · Posted 11-20-2023 10:26 AM

Whenever I see such structures, I see that there is data (fruit/vegetable types and sequence numbers) hidden in structure (variable names). Data belongs in variables, not in their names, so you should transpose your dataset and extract the relevant data from the _NAME_ variable you get from PROC TRANSPOSE.

Then you can do it in SQL, which has the concept of sets (of observations), but not of arrays.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

JMagenta · Posted 11-20-2023 10:36 AM

Hello,

I get it, but to be clear,

This was the only way I could find to group the data.

Each variable group:

Apples1 Oranges 1 Carrots1 belongs to basket1.

Apples2 Oranges2 Carrots2 belongs to basket2 and so on.

So if Apples2 is the variable with the minimum value for that line for Basket1;

Then basically what is Carrots2 and Oranges2:

FruitA=Oranges2;

FruitB=Carrots2;

This is what I want.

Sorry if it's not too clear.

Kurt_Bremser · Posted 11-20-2023 10:55 AM

Rid yourself of your Excel-thinking. In Excel, data is always stored in report-form, which is unsuited for computation (unless you have a spreadsheet).

In real computing, data is stored in relational, sequential tables, with one variable for one fact.

So let's transpose you dumb data to intelligent data:

data dumb;
input id $ apples1 apples2 apples3 carrots1 carrots2 carrots3 oranges1 oranges2 oranges3;
datalines;
1 219 285 185 463 192 316 219 352 196
;

proc transpose data=dumb out=intelligent1 (rename=(col1=value));
by id;
var apples: carrots: oranges:;
run;

data intelligent2;
set intelligent1;
i = anydigit(_name_);
fruit = substr(_name_,1,i-1);
seq = input(substr(_name_,i),best.);
drop i _name_;
run;

Then you can use SQL:

proc sql;
create table want as
select
  a.id,
  a.seq,
  a.fruit,
  a.value
from intelligent2 a
where a.seq in (
  select seq 
  from intelligent2 b 
  where b.id = a.id and b.fruit = "apples"
  group by b.id
  having b.value = max(b.value)
)
;
quit;

And create a wide report without hassle:

proc report data=want;
column id seq value,fruit;
define id / group;
define seq / group;
define value / "" analysis;
define fruit / "" across;
run;

Result:

id	seq	apples	carrots	oranges
1	2	285	192	352

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Kurt_Bremser · Posted 11-20-2023 10:58 AM

PS the long structure also allows easy analysis over different groupings, as you can use the values in fruit, or the sequence, as the main index. With the wide structure, this is hard to impossible.

See Maxim 19.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Register Today!

SAS Training: Just a Click Away