Would someone help explain why the one method of using an array works as I would expect but the other method doesn't work as I would expect? The only change I make to the code is to the array member declaration (colon placement). The vol_: variables are all numeric and the other three vars are character. Notice how "method two" affects the vol_I_0000 variable but doesn't appear to effect the other volume variables.
Thanks for any help or insight!
XXXXXXXX
INPUT DATA
vol_I_
Obs store_id quarter year vol_g_zz vol_g_fp vol_g_bd vol_g_pl 0000
936 001500 Q109 2009 3 446 75 0 524
937 001500 Q209 2009 1 422 48 0 471
938 001500 Q309 2009 0 360 67 0 427
939 001500 Q409 2009 0 385 55 0 440
940 001500 Q110 2010 0 348 84 0 432
ARRAY METHOD 1
DATA share_one;
SET &cleaned_data(KEEP=store_id year quarter vol_g: vol_i: );
ARRAY share vol_g: vol_i: ;
DO OVER share;
IF vol_i_0000 NE 0 THEN DO;
share = (share / vol_i_0000);
END;
ELSE IF vol_i_0000 = 0 THEN DO;
share = 0;
END;
END;
RUN;
PRINT OF SHARE_ONE
vol_I_
Obs store_id quarter year vol_g_zz vol_g_fp vol_g_bd vol_g_pl 0000
1 001500 Q109 2009 0.00573 0.85115 0.14313 0 1
2 001500 Q209 2009 0.00212 0.89597 0.10191 0 1
3 001500 Q309 2009 0.00000 0.84309 0.15691 0 1
4 001500 Q409 2009 0.00000 0.87500 0.12500 0 1
5 001500 Q110 2010 0.00000 0.80556 0.19444 0 1
ARRAY METHOD 2
DATA share_two;
SET &cleaned_data(KEEP=store_id year quarter vol_g: vol_i: );
ARRAY share vol_: ;
DO OVER share;
IF vol_i_0000 NE 0 THEN DO;
share = (share / vol_i_0000);
END;
ELSE IF vol_i_0000 = 0 THEN DO;
share = 0;
END;
END;
RUN;
PRINT OF SHARE_TWO
vol_I_
Obs store_id quarter year vol_g_zz vol_g_fp vol_g_bd vol_g_pl 0000
1 001500 Q109 2009 3 446 75 0 1
2 001500 Q209 2009 1 422 48 0 1
3 001500 Q309 2009 0 360 67 0 1
4 001500 Q409 2009 0 385 55 0 1
5 001500 Q110 2010 0 348 84 0 1
The population of the list of variable data from vol_: is based on the variable number. In your example it appears as:
# Variable Type Len
2 quarter Char 8
1 store_id Char 8
8 vol_I_0000 Num 8
6 vol_g_bd Num 8
5 vol_g_fp Num 8
7 vol_g_pl Num 8
4 vol_g_zz Num 8
3 year Char 8
So calling for vol_: returns vol_g_zz vol_g_fp vol_g_bd vol_g_pl vol_i_0000
I assume your actual data would present with # for vol_i_0000 being before vol_g:
Art,
You can simulate the OP's issue if I am correct using your example data and:
data share_three;
set cleaned_data(keep=store_id year quarter vol_i: vol_g: );
array share vol_i: vol_g:;
do over share;
share=ifn(vol_i_0000 ne 0,share/vol_i_0000,0);
end;
run;
or if you modify your version of cleaned_data to:
data cleaned_data;
input store_id $ quarter $ year $
vol_i_0000 vol_g_zz vol_g_fp vol_g_bd vol_g_pl;
cards;
001500 Q109 2009 524 3 446 75 0
001500 Q209 2009 471 1 422 48 0
001500 Q309 2009 427 0 360 67 0
001500 Q409 2009 440 0 385 55 0
001500 Q110 2010 432 0 348 84 0
;
run;
# Variable Type Len
2 quarter Char 8
1 store_id Char 8
7 vol_g_bd Num 8
6 vol_g_fp Num 8
8 vol_g_pl Num 8
5 vol_g_zz Num 8
4 vol_i_0000 Num 8
3 year Char 8
now you will see share_two has the OP's issue.
I must be missing something. I ran the following code and got identical results for both files:
data cleaned_data;
input store_id $ quarter $ year $
vol_g_zz vol_g_fp vol_g_bd vol_g_pl vol_I_0000;
cards;
001500 Q109 2009 3 446 75 0 524
001500 Q209 2009 1 422 48 0 471
001500 Q309 2009 0 360 67 0 427
001500 Q409 2009 0 385 55 0 440
001500 Q110 2010 0 348 84 0 432
;
DATA share_one;
SET cleaned_data(KEEP=store_id year quarter vol_g: vol_i: );
ARRAY share vol_g: vol_i: ;
DO OVER share;
IF vol_i_0000 NE 0 THEN DO;
share = (share / vol_i_0000);
END;
ELSE IF vol_i_0000 = 0 THEN DO;
share = 0;
END;
END;
RUN;
DATA share_two;
SET cleaned_data(KEEP=store_id year quarter vol_g: vol_i: );
ARRAY share vol_: ;
DO OVER share;
IF vol_i_0000 NE 0 THEN DO;
share = (share / vol_i_0000);
END;
ELSE IF vol_i_0000 = 0 THEN DO;
share = 0;
END;
END;
RUN;
The population of the list of variable data from vol_: is based on the variable number. In your example it appears as:
# Variable Type Len
2 quarter Char 8
1 store_id Char 8
8 vol_I_0000 Num 8
6 vol_g_bd Num 8
5 vol_g_fp Num 8
7 vol_g_pl Num 8
4 vol_g_zz Num 8
3 year Char 8
So calling for vol_: returns vol_g_zz vol_g_fp vol_g_bd vol_g_pl vol_i_0000
I assume your actual data would present with # for vol_i_0000 being before vol_g:
Art,
You can simulate the OP's issue if I am correct using your example data and:
data share_three;
set cleaned_data(keep=store_id year quarter vol_i: vol_g: );
array share vol_i: vol_g:;
do over share;
share=ifn(vol_i_0000 ne 0,share/vol_i_0000,0);
end;
run;
or if you modify your version of cleaned_data to:
data cleaned_data;
input store_id $ quarter $ year $
vol_i_0000 vol_g_zz vol_g_fp vol_g_bd vol_g_pl;
cards;
001500 Q109 2009 524 3 446 75 0
001500 Q209 2009 471 1 422 48 0
001500 Q309 2009 427 0 360 67 0
001500 Q409 2009 440 0 385 55 0
001500 Q110 2010 432 0 348 84 0
;
run;
# Variable Type Len
2 quarter Char 8
1 store_id Char 8
7 vol_g_bd Num 8
6 vol_g_fp Num 8
8 vol_g_pl Num 8
5 vol_g_zz Num 8
4 vol_i_0000 Num 8
3 year Char 8
now you will see share_two has the OP's issue.
Indeed - that does appear to be the issue! My dataset has "vol_i_0000" before "year" and my other volume measures.
# Variable Type Len
2 quarter Char 8
1 store_id Char 6
3 vol_I_0000 Num 8
7 vol_g_bd Num 8
6 vol_g_fp Num 8
8 vol_g_pl Num 8
5 vol_g_zz Num 8
4 year Char 4
After Art's response - which let me know that I wasn't entirely crazy in thinking that it should work, I did some testing and I narrowed it down to the fact that if I used RETAIN to reorder my variables in the PDV I could get it to work.
Fried Egg, thanks so much - would you elaborate on this? I thought variable number only comes into play when using list operators like "--" ? So my thought that the colon-wildcard does some kind of character matching is entirely flawed. Is this only related to arrays?
Reeza, thanks for your contribution also.
http://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/viewer.htm#a000695105.htm
The order matter with all types of variable list operators.
Here is an example list of variables:
# | Variable | type |
---|---|---|
1 | store_id | num |
2 | quarter | char |
3 | vol_i_0000 | num |
4 | vol_g_bd | num |
5 | vol_g_fp | num |
6 | vol_g_pl | num |
7 | year | char |
8 | vol_g_zz | num |
9 | foo | char |
10 | bar | char |
You can try it out to see for yourself:
data cleaned_data;
input store_id $ quarter $ year $
vol_i_0000 vol_g_zz vol_g_fp vol_g_bd vol_g_pl;
cards;
001500 Q109 2009 524 3 446 75 0
001500 Q209 2009 471 1 422 48 0
001500 Q309 2009 427 0 360 67 0
001500 Q409 2009 440 0 385 55 0
001500 Q110 2010 432 0 348 84 0
;
run;
data _null_;
set cleaned_data(obs=1);
put (vol_g:) (=);
put (vol_:) (=);
put (_numeric_) (=);
put (year--vol_g_zz) (=);
put (year-numeric-vol_g_zz) (=);
run;
Ok, thanks, again.
With that said, I am still unclear as to why using "vol_:" in my second datastep didn't appear to include the "vol_g:" variables in the array? Does the fact that there was a character variable among the numeric variables (by variable order in dataset) stop the 'search' for additional numeric variables that have the "vol_" prefix?
In your second example when using vol_: you assign share[1] to vol_i_0000.
Walk through your loop:
DATA share_two;
SET &cleaned_data(KEEP=store_id year quarter vol_g: vol_i: );
ARRAY share vol_: ;
DO OVER share;
IF vol_i_0000 NE 0 THEN DO;
share = (share / vol_i_0000);
END;
ELSE IF vol_i_0000 = 0 THEN DO;
share = 0;
END;
END;
RUN;
ARRAY share vol_: ;
1. Determine what vol_: =;
3 vol_I_0000 Num 8
7 vol_g_bd Num 8
6 vol_g_fp Num 8
8 vol_g_pl Num 8
5 vol_g_zz Num 8
vol_: = vol_i_0000 vol_g_zz vol_g_fp vol_g_bd vol_g_pl
2. Determine fully qualified statement
ARRAY share
DO OVER SHARE; *initiate loop;
First row of data:
store_id quarter year vol_i_0000 vol_g_zz vol_g_fp vol_g_bd vol_g_pl
001500 Q109 2009 524 3 446 75 0
array share = 524 3 446 75 0
IF VOL_I_0000 NE 0; *VOL_I_0000=524 TRUE;
share = (share / vol_i_0000); *SHARE[1]=SHARE[1]/vol_i_0000 share[1]=vol_i_0000 share[1]=524/524 =1;
In the first iteration of your loop you recalculate vol_i_0000 to 1 by dividing it by itself, so all your other variables you divide by 1 instead of 524.
Ahhhh! Thanks so much for the helpful explanation. So order does matter in the array members but the order doesn't explicitly change what "vol_:" does. It was the operation order that I was not thinking through.
Thanks so much! And this points to Reeza's helpful comment (i can't find it now?) about it probably not being a wise idea to include one of my arithmetic terms in the array itself.
Are the order of the variables in the file specified as above the same as the ones in your dataset when you run it?
Ie, where's the vol i variable in the order of things.
Actually looking at what your code is doing you should probably not be including the denominator variable in the array.
Then it would not matter if it was before, after or in the middle of the list of numerator variables in the data vector.
data share_three;
set &cleaned_data(keep=store_id year quarter vol_g: vol_i: );
array share vol_g: ;
do over share;
if vol_i_0000 = 0 then share=0;
else share = share / vol_i_0000 ;
end;
* Un comment next line if you want to force VOL_I_0000 to 1 as your previous code did ;
* if vol_i_0000 ne 0 then vol_i_0000=1;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.