Hello,
I have a problem. I have 60 variables:
Apples1--Apples20
Carrots1--Carrots20
Oranges1--Oranges20
What I want to do is group variables with the same suffix together in the SAS dataset:
Apples1 Carrots1 Oranges1 ... Apples20 Carrots20 Oranges20
How can I do this?
Next, depending on the value of Apples1--Apples20 ( using formula Min(Apples1--Apples20) to find the minimum value, I would like to create new variables with their values.
For instance:
Apples2 is less than all in group Apples1--Apples20, so :
Earlyfruit=Apples2;
Afruit=Carrot2;
Bfruit=Orange2;
Is this possible? Can I use "proc sql" to accomplish this somehow?
Any advice would be helpful.
Thank you,
J Magenta
data have2;
retain &varlist;
set have;
Array Apples_ {*} apples1-apples20;
Array Carrot_{*} Carrots1-carrots20;
Array oranges_{*} oranges1-oranges20;
Earlyfruit=min(of apples_[*]);
index_earlyfruit = whichn(earlyfruit, of apples_(*));
afruit = carrot_(index_earlyfruit);
bfruit = oranges_(index_earlyfruit);
run;
Use WHICHN to find the index of the minimum value. Note that if you have ties, the first one is returned.
@JMagenta wrote:
Awesome!
Thank you so much!
But for the last part, where I create new variables, it gets tricky.
Could this work?
data have2;
retain &varlist;
set have;
Array Apples_ {*} apples1-apples20;
Array Carrot_{*} Carrots1-carrots20;
Array oranges_{*} oranges1-oranges20;
do i=1 to dim(apples_);
Earlyfruit=min(of apples_[*]);
Afruit=Carrot_[min(of apples_[*])]; /* I want the carrot that is chosen to match the suffix of Apple without changing the value of Carrot*/
Bfruit=Oranges_[min(of apples_[*])]; /* same here with Oranges*/
end;
run;
Here you go.
I assume some of the SAS syntax used will be new to you. I suggest you first consult the SAS docu and then ask questions for the bits that remain unclear.
/***
I have 60 variables:
Apples1--Apples20
Carrots1--Carrots20
Oranges1--Oranges20
**/
data work.have;
array Apples {20} 8;
array Carrots {20} 8;
array Oranges {20} 8;
do i=1 to dim(apples);
apples[i]=mod(i*10,dim(apples))+1;
carrots[i]=apples[i]*5;
oranges[i]=apples[i]*8;
end;
drop i;
run;
/* 1. What I want to do is group variables with the same suffix together in the SAS dataset */
/* Comment: The order of variables in a table is of no real relevance. Such ordering requests are normally reporting */
/* requirements to be addressed as part of report generation */
/* sort selected variables by digits in the variable name and populate a macro variable with the result */
proc sql noprint;
select name into :varlist separated by ' '
from dictionary.columns
where
libname='WORK'
and memname='HAVE'
and (
upcase(name) like 'APPLES%'
or upcase(name) like 'CARROTS%'
or upcase(name) like 'ORANGES%'
)
order by compress(name,,'kd'), name
;
quit;
/* use macro variable to print data */
proc print data=work.have;
var &varlist;
run;
/* 2. Next, depending on the value of Apples1--Apples20 ( using formula Min(Apples1--Apples20) */
/* to find the minimum value, I would like to create new variables with their values. */
/* below 3 coding options for doing this. */
/* - option one - of Apples: - will include all variables where the name starts with Apple - would also include AppleComputer */
data want;
set have;
array _oranges{*} Oranges1 - Oranges20;
min_apple=min(of Apples:);
min_carrots=min(of Carrots1 - Carrots20);
min_oranges=min(of _oranges[*]);
run;
proc print data=want;
var min_:;
run;
Here the chatGPT explanation of above code which isn't too bad
Creating the Initial Dataset (work.have
😞
data work.have;
array Apples {20} 8;
array Carrots {20} 8;
array Oranges {20} 8;
do i=1 to dim(apples);
apples[i]=mod(i*10,dim(apples))+1;
carrots[i]=apples[i]*5;
oranges[i]=apples[i]*8;
end;
drop i;
run;
work.have
with 60 variables: Apples1
through Apples20
, Carrots1
through Carrots20
, and Oranges1
through Oranges20
.Apples
, Carrots
, and Oranges
) of size 20 are defined.apples[i]=mod(i*10,dim(apples))+1
) and similar formulas for carrots
and oranges
.drop i;
statement removes the loop index variable i
.Sorting Variables and Creating a Macro Variable (varlist
😞
proc sql noprint;
select name into :varlist separated by ' '
from dictionary.columns
where
libname='WORK'
and memname='HAVE'
and (
upcase(name) like 'APPLES%'
or upcase(name) like 'CARROTS%'
or upcase(name) like 'ORANGES%'
)
order by compress(name,,'kd'), name
;
quit;
dictionary.columns
table to get the variable names in the work.have
dataset that meet certain conditions.where
clause, filtering for variables starting with 'APPLES', 'CARROTS', or 'ORANGES'.order by compress(name,,'kd'), name
).varlist
, separated by spaces.Printing Selected Variables Using Macro Variable:
proc print data=work.have;
var &varlist;
run;
work.have
dataset, selecting the variables specified in the macro variable varlist
.&varlist
is resolved at runtime, and it includes the list of variables retrieved and sorted in the previous step.Calculating Minimum Values and Creating a New Dataset (want
😞
data want;
set have;
array _oranges{*} Oranges1 - Oranges20;
min_apple=min(of Apples:);
min_carrots=min(of Carrots1 - Carrots20);
min_oranges=min(of _oranges[*]);
run;
want
by using the set
statement to copy the observations from the have
dataset.min_apple
, min_carrots
, and min_oranges
) are calculated using the min
function and the of
operator.min_apple
, it calculates the minimum value of all variables starting with 'Apples'.min_carrots
, it calculates the minimum value of variables from Carrots1
to Carrots20
.min_oranges
, it calculates the minimum value of the array _oranges
that includes Oranges1
through Oranges20
.Printing the New Dataset (want
😞
proc print data=want;
var min_:;
run;
want
dataset, selecting the variables that start with 'min_'.In summary, the code generates a dataset with 60 variables, sorts and selects specific variables based on their names, calculates the minimum values for selected variable groups, and prints the results.
Awesome!
Thank you so much!
But for the last part, where I create new variables, it gets tricky.
Could this work?
data have2;
retain &varlist;
set have;
Array Apples_ {*} apples1-apples20;
Array Carrot_{*} Carrots1-carrots20;
Array oranges_{*} oranges1-oranges20;
do i=1 to dim(apples_);
Earlyfruit=min(of apples_[*]);
Afruit=Carrot_[min(of apples_[*])]; /* I want the carrot that is chosen to match the suffix of Apple without changing the value of Carrot*/
Bfruit=Oranges_[min(of apples_[*])]; /* same here with Oranges*/
end;
run;
data have2;
retain &varlist;
set have;
Array Apples_ {*} apples1-apples20;
Array Carrot_{*} Carrots1-carrots20;
Array oranges_{*} oranges1-oranges20;
Earlyfruit=min(of apples_[*]);
index_earlyfruit = whichn(earlyfruit, of apples_(*));
afruit = carrot_(index_earlyfruit);
bfruit = oranges_(index_earlyfruit);
run;
Use WHICHN to find the index of the minimum value. Note that if you have ties, the first one is returned.
@JMagenta wrote:
Awesome!
Thank you so much!
But for the last part, where I create new variables, it gets tricky.
Could this work?
data have2;
retain &varlist;
set have;
Array Apples_ {*} apples1-apples20;
Array Carrot_{*} Carrots1-carrots20;
Array oranges_{*} oranges1-oranges20;
do i=1 to dim(apples_);
Earlyfruit=min(of apples_[*]);
Afruit=Carrot_[min(of apples_[*])]; /* I want the carrot that is chosen to match the suffix of Apple without changing the value of Carrot*/
Bfruit=Oranges_[min(of apples_[*])]; /* same here with Oranges*/
end;
run;
To Everyone on this project!!
THANK YOU!!!
I am learning day by day, and you are each helping me do that!
JMagenta
This table structure may serve a purpose for reporting or specific analysis use cases, but it's not very practical for data storage.
It would be better to store your data in a more flexible (normalised) manner, and then create the desired structure when needed.
Also, I'm missing some kind id variable in your example...
I'm suggesting something like
id date/period fruit fruit_no fruit_value
1 jan2023 orange 1 45
Then you can use formats or other means to group you values, and other kind of aggregation. and then PROC TRANSPOSE to create your target layout.
I agree with the suggestion to transpose this data into a more vertical / normalized format to make it easier to work with.
@JMagenta , could you post some example data? Ideally, a DATA step that uses the CARDS statement with just a few rows and a few variables, maybe ID Apple1-Apple3 Carrot1-Carrot3 Orange1-Orange3? Also please show the output data you would want for this example data.
data work.have;
array Apples {20} 8;
array Carrots {20} 8;
array Oranges {20} 8;
do i=1 to dim(apples);
apples[i]=i+1; /* I did "i+5" and so forth so that the numbers were unique values*/
carrots[i]=i+5;
oranges[i]=i+8;
end;
drop i;
run;
proc sql noprint;
select name into :varlist separated by ' '
from dictionary.columns
where
libname='WORK'
and memname='HAVE'
and (
upcase(name) like 'APPLES%'
or upcase(name) like 'CARROTS%'
or upcase(name) like 'ORANGES%'
)
order by compress(name,,'kd'), name
;
quit;
data have2;
retain &varlist;
set have;
Array Apples_ {*} apples1-apples20;
Array Carrot_{*} Carrots1-carrots20;
Array oranges_{*} oranges1-oranges20;
do i=1 to dim(apples_);
Earlyfruit=min(of apples_[*]);
Afruit=Carrot_[min(of apples_[*])];
Bfruit=Oranges_{min(of apples_[*])};
end;
run;
The result value for Easyfruit is correct and equals Apples1 which is 2.
The result for Afruit should equal Carrots1 which is 6 instead it equals 7 (likely the value of Apples6).
The result for Bfruit should equal Oranges 1 which is 9 instead it equals 10 (likely the value of Apples9).
Whenever I see such structures, I see that there is data (fruit/vegetable types and sequence numbers) hidden in structure (variable names). Data belongs in variables, not in their names, so you should transpose your dataset and extract the relevant data from the _NAME_ variable you get from PROC TRANSPOSE.
Then you can do it in SQL, which has the concept of sets (of observations), but not of arrays.
Hello,
I get it, but to be clear,
This was the only way I could find to group the data.
Each variable group:
Apples1 Oranges 1 Carrots1 belongs to basket1.
Apples2 Oranges2 Carrots2 belongs to basket2 and so on.
So if Apples2 is the variable with the minimum value for that line for Basket1;
Then basically what is Carrots2 and Oranges2:
FruitA=Oranges2;
FruitB=Carrots2;
This is what I want.
Sorry if it's not too clear.
Rid yourself of your Excel-thinking. In Excel, data is always stored in report-form, which is unsuited for computation (unless you have a spreadsheet).
In real computing, data is stored in relational, sequential tables, with one variable for one fact.
So let's transpose you dumb data to intelligent data:
data dumb;
input id $ apples1 apples2 apples3 carrots1 carrots2 carrots3 oranges1 oranges2 oranges3;
datalines;
1 219 285 185 463 192 316 219 352 196
;
proc transpose data=dumb out=intelligent1 (rename=(col1=value));
by id;
var apples: carrots: oranges:;
run;
data intelligent2;
set intelligent1;
i = anydigit(_name_);
fruit = substr(_name_,1,i-1);
seq = input(substr(_name_,i),best.);
drop i _name_;
run;
Then you can use SQL:
proc sql;
create table want as
select
a.id,
a.seq,
a.fruit,
a.value
from intelligent2 a
where a.seq in (
select seq
from intelligent2 b
where b.id = a.id and b.fruit = "apples"
group by b.id
having b.value = max(b.value)
)
;
quit;
And create a wide report without hassle:
proc report data=want;
column id seq value,fruit;
define id / group;
define seq / group;
define value / "" analysis;
define fruit / "" across;
run;
Result:
id seq apples carrots oranges 1 2 285 192 352
PS the long structure also allows easy analysis over different groupings, as you can use the values in fruit, or the sequence, as the main index. With the wide structure, this is hard to impossible.
See Maxim 19.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.