Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Programming
- /
- Programming
- /
- loop for changing variables in changing datasets

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 06-14-2023 05:37 AM
(496 views)

Hi,

I have a problem and Im thankful for help.

Given is the data set "final". Today, for example, it contains 3 variables, but it could also have 5 or 10 variables the next time it is created. The names of the variables are always freely selectable and have no logic. But they all have one thing in common: The first variable is always "Eink" with the same characteristics and the goal is calculating a median (based on the groups in "Eink").

For the current dataset I have written a syntax for calculating the median. I attached the current syntax below. Based on my current dataset, **Bg** and **He** were used there. My problem is: If the record "final" appears tomorrow with other variables, the syntax should adapt individually to it. If tomorrow "Bg" and "He" no longer exist, but for example XX, XY and XZ, then the syntax should automatically be applied to these three variables with their specific names. So I probably need a loop, but am unable to write it. Could anyone help me with this big concern? Would be great!

**dataset "final" given**

Eink | Bg | He |

01 | 55.487739 | 2.722371 |

02 | 127.485645 | 18.670678 |

03 | 182.95309 | 53.529988 |

04 | 354.69612 | 122.703093 |

05 | 450.777081 | 137.596942 |

06 | 461.617933 | 157.457164 |

07 | 429.38197 | 192.283645 |

08 | 370.644235 | 173.688658 |

09 | 281.256479 | 180.848677 |

10 | 170.426973 | 137.045263 |

11 | 118.22022 | 128.45571 |

12 | 93.656461 | 104.596442 |

13 | 68.561622 | 112.710123 |

14 | 42.443895 | 77.742469 |

15 | 50.933162 | 136.718895 |

16 | 35.285987 | 104.219131 |

17 | 23.123863 | 78.424041 |

18 | 16.724659 | 89.48717 |

19 | 9.271308 | 46.396113 |

20 | 4.870804 | 21.574738 |

21 | 4.83902 | 19.753305 |

22 | 3.487616 | 9.23105 |

23 | 0.860551 | 2.136266 |

24 | 1.202043 | 1.861731 |

**code**

```
/*1*/
Data Eink_2;
set final;
if Eink = "01" then do; Xmin = 0 ; Xmax = 250 ; end;
if Eink = "02" then do; Xmin = 250 ; Xmax = 500 ; end;
if Eink = "03" then do; Xmin = 500 ; Xmax = 750 ; end;
if Eink = "04" then do; Xmin = 750 ; Xmax = 1000 ; end;
if Eink = "05" then do; Xmin = 1000 ; Xmax = 1250 ; end;
if Eink = "06" then do; Xmin = 1250 ; Xmax = 1500 ; end;
if Eink = "07" then do; Xmin = 1500 ; Xmax = 1750 ; end;
if Eink = "08" then do; Xmin = 1750 ; Xmax = 2000 ; end;
if Eink = "09" then do; Xmin = 2000 ; Xmax = 2250 ; end;
if Eink = "10" then do; Xmin = 2250 ; Xmax = 2500 ; end;
if Eink = "11" then do; Xmin = 2500 ; Xmax = 2750 ; end;
if Eink = "12" then do; Xmin = 2750 ; Xmax = 3000 ; end;
if Eink = "13" then do; Xmin = 3000 ; Xmax = 3250 ; end;
if Eink = "14" then do; Xmin = 3250 ; Xmax = 3500 ; end;
if Eink = "15" then do; Xmin = 3500 ; Xmax = 4000 ; end;
if Eink = "16" then do; Xmin = 4000 ; Xmax = 4500 ; end;
if Eink = "17" then do; Xmin = 4500 ; Xmax = 5000 ; end;
if Eink = "18" then do; Xmin = 5000 ; Xmax = 6000 ; end;
if Eink = "19" then do; Xmin = 6000 ; Xmax = 7000 ; end;
if Eink = "20" then do; Xmin = 7000 ; Xmax = 8000 ; end;
if Eink = "21" then do; Xmin = 8000 ; Xmax = 10000 ; end;
if Eink = "22" then do; Xmin = 10000 ; Xmax = 15000 ; end;
if Eink = "23" then do; Xmin = 15000 ; Xmax = 25000 ; end;
if Eink = "24" then do; Xmin = 25000 ; Xmax = . ; end;
run;
/*2*/
Data Eink_3;
set Eink_2;
cum_Bg+Bg;
cum_He+He;
run;
/*3*/
proc sql noprint;
select sum(Bg)/2 into: n2_B from Eink_3;
select sum(He)/2 into: n2_H from Eink_3;
run;
/*4*/
data Eink_4;
set Eink_3;
Fx_prä_B = &n2_B. - cum_Bg;
Fx_prä_H = &n2_H. - cum_He;
run;
/*5*/
proc rank data=Eink_4 out=fx_1_Rang_B; var Fx_prä_B; ranks rang ; where Fx_prä_B >0;
proc rank data=Eink_4 out=fx_1_Rang_H; var Fx_prä_H; ranks rang ; where Fx_prä_H >0;
/*6*/
proc sql noprint; select put(cum_Bg, decimal.7) into:Fx_1_B from fx_1_Rang_B where rang =1; quit;
proc sql noprint; select put(cum_He, decimal.7) into:Fx_1_H from fx_1_Rang_H where rang =1; quit;
/*7*/
proc rank data=Eink_4 descending out=fx_B; var Fx_prä_B; ranks Rang; where Fx_prä_B <0;
proc rank data=Eink_4 descending out=fx_H; var Fx_prä_H; ranks Rang; where Fx_prä_H <0;
/*8*/
proc sql noprint;
select put(Bg, decimal.7) into:fx_B from fx_B where rang=1;
select put(He, decimal.7) into:fx_H from fx_H where rang=1;
run;
/*9*/
proc sql noprint;
select Xmin into: Xmin_B from fx_B where Rang = 1;
select Xmin into: Xmin_H from fx_H where Rang = 1;
quit;
/*10*/
proc sql noprint;
select Xmax-Xmin into: b_B from fx_B where Rang = 1;
select Xmax-Xmin into: b_H from fx_H where Rang = 1;
quit;
/*11 - into the formula */
Data Median_Final;
retain Eink Bg He;
Format
Bg 15.12
He 15.12;
Eink = "Median";
Bg = &Xmin_B. + &b_B.*((&n2_B.-&Fx_1_B.)/&fx_B.);
He = &Xmin_H. + &b_H.*((&n2_H.-&Fx_1_H.)/&fx_H.);
run;
```

17 REPLIES 17

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hmm, wouldnt be the best idea, because I have a total of 11 steps, which each create their own temporary datasets. As an example, I'll supplement my code snippet from above with the subsequent step of my syntax. So they always refer to each other.

I would like to apply these 11 steps as a loop. The code snippet was just one of these 11 on which the macro could be exemplified. I'll try to include the rest then.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Yes, please show us, because the suggestion from @Kurt_Bremser ought to work on the 11 different steps.

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

It looks like you are computing some sort of a median, using sums, max, min for each "half" of the data, and then plugging these into a formula. I would think there are better/faster ways to do this that don't involve macros or macro variables, but I don't fully understand the process or the end result. Could you please explain what you are trying to do?

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

The goal is to calculate a weigthed median, based on a formular from the institution Im working for. With the 11 steps I calculate each component of the formular:

- Xmin_B and Xmin_H

- b_B and b_H

- n2_B and n2_H

- Fx_1_B and Fx_1_H

- fx_B and fx_H

At the end, I put these components into the formula and calculate the median (step 11). That works pretty well. But with new variables I always have to manually adjust the syntax. And my hope is, that there is a solution like a loop...

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

With such a complex process which can't use the available SAS procedures, you have to feed a list of variables to a macro and use %DO loops to create the repeating code wherever needed.

You can create multiple macro variables in a single SELECT INTO, so use that to speed up your process.

Similarly, avoid to run multiple steps for each variable; in most cases, you can process multiple variables in a single step.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@Konkordanz wrote:

Hmm, wouldnt be the best idea, because I have a total of 11 steps, which each create their own temporary datasets. As an example, I'll supplement my code snippet from above with the subsequent step of my syntax. So they always refer to each other.

I would like to apply these 11 steps as a loop. The code snippet was just one of these 11 on which the macro could be exemplified. I'll try to include the rest then.

If you have a data set with a grouping variable and want to create summary statistics for numeric variables then proc means/summary is likely where you belong. Splitting data into separate data sets just to get a median of a value is just "wrong" in a SAS approach.

Here is a small example with a data set you should have available to examine.

Proc summary data=sashelp.class nway; class sex; var _numeric_; output out=want (drop=_type_ _freq_) median = ; run;

The option NWAY is to just show the result for the highest _type_ value of the result. _Type_ indicates which combination(s) of Class variables are involved in an output observation. The default output would also include an overall summary for all observations and if more variables are involved on the class statement then there multiple combinations possible.

Class indicates your grouping variable.

Var is list of numeric variables to use. In this case _numeric_ uses all of the numeric variables in the data set (there are only 3).

The output statement names the output dataset and requests which statistic(s) to create. The drop removes two automatic variables that would appear by default. The Median= requests the median statistic. With a single statistic that means the output statistic has the same name as the input variable. You can either name the resulting statistic your self or use /autoname option to create names with the statistic appended.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thx for answering. I wouldnt say "wrong", because I dont calculate an default median. I have to calculate a median based on a formular of the institution Im working for. I have fleshed out my initial post above.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Looks to me like your first 4 steps are just these two steps. Plus I generated SIGN variables that can probably be used to reduce the PROC RANK steps to one instead of two.

```
data final ;
input Eink :$2. Bg He ;
cards;
01 55.487739 2.722371
02 127.485645 18.670678
03 182.95309 53.529988
04 354.69612 122.703093
05 450.777081 137.596942
06 461.617933 157.457164
07 429.38197 192.283645
08 370.644235 173.688658
09 281.256479 180.848677
10 170.426973 137.045263
11 118.22022 128.45571
12 93.656461 104.596442
13 68.561622 112.710123
14 42.443895 77.742469
15 50.933162 136.718895
16 35.285987 104.219131
17 23.123863 78.424041
18 16.724659 89.48717
19 9.271308 46.396113
20 4.870804 21.574738
21 4.83902 19.753305
22 3.487616 9.23105
23 0.860551 2.136266
24 1.202043 1.861731
;
data min_max ;
length Eink $2 Xmin Xmax 8;
do xmax = 250 to 3500 by 250,4000,4500,5000,6000,7000,8000,10000,15000,25000,. ;
i+1;
eink=put(i,z2.);
xmin=sum(0,lag(xmax));
output;
end;
drop i;
run;
proc summary data=FINAL ;
var Bg He ;
output out=sums(drop=_freq_ _type_) sum=sum_Bg sum_He ;
run;
data STEP4;
merge final min_max ;
by eink;
if _n_=1 then set sums;
cum_Bg + Bg;
Fx_Bg = sum_bg/2 - cum_Bg;
Sign_Bg = sign(fx_bg);
cum_He + He;
Fx_He = sum_he/2 - cum_He;
Sign_He = sign(fx_he);
run;
```

So if Bg and He represent the variables you need to process then you might use macro code to generate those blocks that use them.

```
...
sum=sum_Bg sum_He ;
...
cum_Bg + Bg;
Fx_Bg = sum_bg/2 - cum_Bg;
Sign_Bg = sign(fx_bg);
cum_He + He;
Fx_He = sum_he/2 - cum_He;
Sign_He = sign(fx_he);
...
```

But you could also transpose the data first. Then no need to generate code (other than the list of variables in the VAR statement of the TRANSPOSE step).

```
proc transpose data=FINAL out=TALL(rename=(col1=VALUE)) name=VARNAME ;
by Eink;
var Bg He ;
run;
proc sort data=TALL;
by varname eink;
run;
proc summary data=TALL nway;
by varname ;
var value;
output out=sums(drop=_freq_ _type_) sum=sum_value ;
run;
data step4_tall;
merge tall sums ;
by varname ;
if _n_=1 then do;
set min_max;
declare hash h(dataset:'min_max');
h.definekey('eink');
h.definedata('Xmin','Xmax');
h.definedone();
end;
if first.varnum then cum_value=.;
h.find();
cum_value + value ;
fx_value = sum_value/2 - cum_value ;
Sign_value = sign(fx_value);
run;
```

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

What is STEP5 to STEP11 doing?

Why is it using RANK instead of just taking the MIN and MAX?

```
proc summary data=step4_tall nway ;
by varname sum_value ;
class sign_value ;
var cum_value fx_value ;
output out=stats min= max= / autoname ;
run;
proc print;
run;
```

cum_ cum_ sum_ Sign_ value_ fx_value_ value_ fx_value_ Obs VARNAME value value _TYPE_ _FREQ_ Min Min Max Max 1 Bg 3358.21 -1 1 18 2062.40 -1679.10 3358.21 -383.30 2 Bg 3358.21 1 1 6 55.49 46.09 1633.02 1623.62 3 He 2109.85 -1 1 15 1176.55 -1054.93 2109.85 -121.62 4 He 2109.85 1 1 9 2.72 15.43 1039.50 1052.20

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

For your example data what values did the macro variables used in that last step end up having?

```
Bg = &Xmin_B. + &b_B.*((&n2_B.-&Fx_1_B.)/&fx_B.);
He = &Xmin_H. + &b_H.*((&n2_H.-&Fx_1_H.)/&fx_H.);
```

If you don't know re-run it with the SYMBOLGEN option turned on and the SAS log will show you.

Are they basically just the values from these rows where the value on the FX variable flips from positive to negative?

sum_ cum_ Sign_ Obs Eink VARNAME VALUE value Xmin Xmax value fx_value value 6 06 Bg 461.618 3358.21 1250 1500 1633.02 46.087 1 7 07 Bg 429.382 3358.21 1500 1750 2062.40 -383.295 -1 33 09 He 180.849 2109.85 2000 2250 1039.50 15.426 1 34 10 He 137.045 2109.85 2250 2500 1176.55 -121.620 -1

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

At this point I think I am going call "Foul" on use of the word "median" at all to describe this process.

When I use that example data and run the code the result I get is

Obs Eink Bg He 1 Median 1526.8329804346 2278.1399438082

Every definition I have run into for median relates to "being in the middle" of values with some slop as to exactly how that middle is defined. Or perhaps "Formally, a median of a population is any value such that at least half of the population is less than or equal to the proposed median and at least half is greater than or equal to the proposed median."

Since the largest value in your Final example data for Bg is 461.6179330 I see no way that a value of 1526 or greater should be considered any sort of "median" for Bg. Similar with your He variable as the max in the set is 192.283645.

This may be a statistic of some sort but a median it ain't.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

So that last formula appears to interpolating between XMIN and XMAX based on the something that looks like perhaps the range at the point where the cum sum crosses the 50% point of the total sum.

```
Bg = &Xmin_B. + &b_B.*((&n2_B.-&Fx_1_B.)/&fx_B.);
He = &Xmin_H. + &b_H.*((&n2_H.-&Fx_1_H.)/&fx_H.);
```

Since you appear to have gotten it to run **what are the values of those 5 macro variables for each analysis** variable that ended up being used to make your result?

Since they appear from the convoluted steps 5 to 10 be just pulling values from the dataset we should be able to look at the numbers and figure out which values the formula is using (much easier than figuring out what that code is trying to pull from looking at it).

Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

**If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. **

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.