DATA Step, Macro, Functions and more

Population of Array with Variable List Wildcard : -- Unexpected Results

Accepted Solution Solved
Reply
New Contributor
Posts: 4
Accepted Solution

Population of Array with Variable List Wildcard : -- Unexpected Results

Would someone help explain why the one method of using an array works as I would expect but the other method doesn't work as I would expect?   The only change I make to the code is to the array member declaration (colon placement).  The vol_: variables are all numeric and the other three vars are character.  Notice how "method two" affects the vol_I_0000 variable but doesn't appear to effect the other volume variables.

Thanks for any help or insight!

XXXXXXXX

INPUT DATA

                                                                                           vol_I_ 
     Obs    store_id    quarter    year    vol_g_zz    vol_g_fp    vol_g_bd    vol_g_pl     0000  
                                                                                                  
     936     001500      Q109      2009         3         446          75          0         524  
     937     001500      Q209      2009         1         422          48          0         471  
     938     001500      Q309      2009         0         360          67          0         427  
     939     001500      Q409      2009         0         385          55          0         440  
     940     001500      Q110      2010         0         348          84          0         432  
 

ARRAY METHOD 1


DATA share_one;                                                                                                  
        SET &cleaned_data(KEEP=store_id year quarter vol_g: vol_i: );  
        ARRAY share vol_g: vol_i:  ;                                                                  
        DO OVER share;                                                                                       
                IF vol_i_0000 NE 0 THEN DO;                                                                  
                share = (share / vol_i_0000);                                                                
                END;                                                                                         
                                                                                                             
                ELSE IF vol_i_0000 = 0 THEN DO;                                                              
                share = 0;                                                                                   
                END;                                                                                         
        END;                                                                                                 
        RUN;      

PRINT OF SHARE_ONE

                                                                                         vol_I_
Obs    store_id    quarter    year    vol_g_zz    vol_g_fp    vol_g_bd      vol_g_pl     0000 
                                                                                               
   1     001500      Q109      2009     0.00573     0.85115     0.14313             0       1  
   2     001500      Q209      2009     0.00212     0.89597     0.10191             0       1  
   3     001500      Q309      2009     0.00000     0.84309     0.15691             0       1  
   4     001500      Q409      2009     0.00000     0.87500     0.12500             0       1  
   5     001500      Q110      2010     0.00000     0.80556     0.19444             0       1  

ARRAY METHOD 2

DATA share_two;                                                                                                  
        SET &cleaned_data(KEEP=store_id year quarter vol_g: vol_i: );  
        ARRAY share vol_: ;                                                                  
        DO OVER share;                                                                                       
                IF vol_i_0000 NE 0 THEN DO;                                                                  
                share = (share / vol_i_0000);                                                                
                END;                                                                                         
                                                                                                             
                ELSE IF vol_i_0000 = 0 THEN DO;                                                              
                share = 0;                                                                                   
                END;                                                                                         
        END;                                                                                                 
        RUN;    

PRINT OF SHARE_TWO

                                                                                       vol_I_    
Obs    store_id    quarter    year    vol_g_zz    vol_g_fp    vol_g_bd    vol_g_pl     0000     
                                                                                                 
   1     001500      Q109      2009         3         446          75          0          1      
   2     001500      Q209      2009         1         422          48          0          1      
   3     001500      Q309      2009         0         360          67          0          1      
   4     001500      Q409      2009         0         385          55          0          1      
   5     001500      Q110      2010         0         348          84          0          1      




Accepted Solutions
Solution
‎12-06-2011 03:40 PM
Trusted Advisor
Posts: 1,301

Population of Array with Variable List Wildcard : -- Unexpected Results

The population of the list of variable data from vol_: is based on the variable number.  In your example it appears as:

                                                   #    Variable      Type    Len

                                                   2    quarter       Char      8

                                                   1    store_id      Char      8

                                                   8    vol_I_0000    Num       8

                                                   6    vol_g_bd      Num       8

                                                   5    vol_g_fp      Num       8

                                                   7    vol_g_pl      Num       8

                                                   4    vol_g_zz      Num       8

                                                   3    year          Char      8

So calling for vol_: returns vol_g_zz vol_g_fp vol_g_bd vol_g_pl vol_i_0000

I assume your actual data would present with # for vol_i_0000 being before vol_g:

Art,

You can simulate the OP's issue if I am correct using your example data and:

data share_three;

  set cleaned_data(keep=store_id year quarter vol_i: vol_g: );

  array share vol_i: vol_g:;

  do over share;

   share=ifn(vol_i_0000 ne 0,share/vol_i_0000,0);

  end;

run;

or if you modify your version of cleaned_data to:

data cleaned_data;

  input store_id $ quarter $ year $

  vol_i_0000 vol_g_zz vol_g_fp vol_g_bd vol_g_pl;

  cards;

001500 Q109 2009 524 3 446 75 0

001500 Q209 2009 471 1 422 48 0

001500 Q309 2009 427 0 360 67 0

001500 Q409 2009 440 0 385 55 0

001500 Q110 2010 432 0 348 84 0

;

run;

                                                   #    Variable      Type    Len

                                                   2    quarter       Char      8

                                                   1    store_id      Char      8

                                                   7    vol_g_bd      Num       8

                                                   6    vol_g_fp      Num       8

                                                   8    vol_g_pl      Num       8

                                                   5    vol_g_zz      Num       8

                                                   4    vol_i_0000    Num       8

                                                   3    year          Char      8

now you will see share_two has the OP's issue.

View solution in original post


All Replies
PROC Star
Posts: 7,481

Population of Array with Variable List Wildcard : -- Unexpected Results

I must be missing something.  I ran the following code and got identical results for both files:

data cleaned_data;

  input store_id $ quarter $ year $

  vol_g_zz vol_g_fp vol_g_bd vol_g_pl vol_I_0000;

  cards;

001500 Q109 2009 3 446 75 0 524

001500 Q209 2009 1 422 48 0 471

001500 Q309 2009 0 360 67 0 427

001500 Q409 2009 0 385 55 0 440

001500 Q110 2010 0 348 84 0 432

;

DATA share_one;                                                                                                 

  SET cleaned_data(KEEP=store_id year quarter vol_g: vol_i: );

  ARRAY share vol_g: vol_i:  ;

  DO OVER share;

    IF vol_i_0000 NE 0 THEN DO;

      share = (share / vol_i_0000);

    END;

    ELSE IF vol_i_0000 = 0 THEN DO;

      share = 0;

    END;

  END;

RUN;

DATA share_two;

  SET cleaned_data(KEEP=store_id year quarter vol_g: vol_i: );

  ARRAY share vol_: ;

  DO OVER share;

    IF vol_i_0000 NE 0 THEN DO;

      share = (share / vol_i_0000);

    END;

    ELSE IF vol_i_0000 = 0 THEN DO;

      share = 0;

    END;

  END;

RUN;    

Solution
‎12-06-2011 03:40 PM
Trusted Advisor
Posts: 1,301

Population of Array with Variable List Wildcard : -- Unexpected Results

The population of the list of variable data from vol_: is based on the variable number.  In your example it appears as:

                                                   #    Variable      Type    Len

                                                   2    quarter       Char      8

                                                   1    store_id      Char      8

                                                   8    vol_I_0000    Num       8

                                                   6    vol_g_bd      Num       8

                                                   5    vol_g_fp      Num       8

                                                   7    vol_g_pl      Num       8

                                                   4    vol_g_zz      Num       8

                                                   3    year          Char      8

So calling for vol_: returns vol_g_zz vol_g_fp vol_g_bd vol_g_pl vol_i_0000

I assume your actual data would present with # for vol_i_0000 being before vol_g:

Art,

You can simulate the OP's issue if I am correct using your example data and:

data share_three;

  set cleaned_data(keep=store_id year quarter vol_i: vol_g: );

  array share vol_i: vol_g:;

  do over share;

   share=ifn(vol_i_0000 ne 0,share/vol_i_0000,0);

  end;

run;

or if you modify your version of cleaned_data to:

data cleaned_data;

  input store_id $ quarter $ year $

  vol_i_0000 vol_g_zz vol_g_fp vol_g_bd vol_g_pl;

  cards;

001500 Q109 2009 524 3 446 75 0

001500 Q209 2009 471 1 422 48 0

001500 Q309 2009 427 0 360 67 0

001500 Q409 2009 440 0 385 55 0

001500 Q110 2010 432 0 348 84 0

;

run;

                                                   #    Variable      Type    Len

                                                   2    quarter       Char      8

                                                   1    store_id      Char      8

                                                   7    vol_g_bd      Num       8

                                                   6    vol_g_fp      Num       8

                                                   8    vol_g_pl      Num       8

                                                   5    vol_g_zz      Num       8

                                                   4    vol_i_0000    Num       8

                                                   3    year          Char      8

now you will see share_two has the OP's issue.

New Contributor
Posts: 4

Re: Population of Array with Variable List Wildcard : -- Unexpected Results

Indeed - that does appear to be the issue!   My dataset has "vol_i_0000" before "year" and my other volume measures.

#    Variable      Type    Len  

                                 

2    quarter       Char      8  

1    store_id      Char      6  

3    vol_I_0000    Num       8  

7    vol_g_bd      Num       8  

6    vol_g_fp      Num       8  

8    vol_g_pl      Num       8  

5    vol_g_zz      Num       8  

4    year          Char      4 

After Art's response - which let me know that I wasn't entirely crazy in thinking that it should work, I did some testing and I narrowed it down to the fact that if I used RETAIN to reorder my variables in the PDV I could get it to work. 

Fried Egg, thanks so much - would you elaborate on this?  I thought variable number only comes into play when using list operators like "--" ?  So my thought that the colon-wildcard does some kind of character matching is entirely flawed.  Is this only related to arrays?  

Reeza, thanks for your contribution also. 

Trusted Advisor
Posts: 1,301

Re: Population of Array with Variable List Wildcard : -- Unexpected Results

http://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/viewer.htm#a000695105.htm

The order matter with all types of variable list operators.

Here is an example list of variables:

#Variabletype
1store_idnum
2quarterchar
3vol_i_0000num
4vol_g_bdnum
5vol_g_fpnum
6vol_g_plnum
7yearchar
8vol_g_zznum
9foochar
10barchar

quater--year                         quarter  vol_i_0000 vol_g_bd  vol_g_fp vol_g_pl year

quarter-numeric-year             vol_i_0000 vol_g_bd  vol_g_fp vol_g_pl
quarter-character-year           quarter year
vol_:                                    vol_i_0000 vol_g_bd  vol_g_fp vol_g_pl vol_g_zz
_numeric_                           store_id vol_i_0000 vol_g_bd  vol_g_fp vol_g_pl vol_g_zz
etc.....
Trusted Advisor
Posts: 1,301

Re: Population of Array with Variable List Wildcard : -- Unexpected Results

You can try it out to see for yourself:

data cleaned_data;

  input store_id $ quarter $ year $

  vol_i_0000 vol_g_zz vol_g_fp vol_g_bd vol_g_pl;

  cards;

001500 Q109 2009 524 3 446 75 0

001500 Q209 2009 471 1 422 48 0

001500 Q309 2009 427 0 360 67 0

001500 Q409 2009 440 0 385 55 0

001500 Q110 2010 432 0 348 84 0

;

run;

data _null_;

set cleaned_data(obs=1);

  put (vol_gSmiley Happy (=);

   put (vol_Smiley Happy (=);

   put (_numeric_) (=);

   put (year--vol_g_zz) (=);

   put (year-numeric-vol_g_zz) (=);

run;

New Contributor
Posts: 4

Population of Array with Variable List Wildcard : -- Unexpected Results

Ok, thanks, again.

With that said, I am still unclear as to why using "vol_:" in my second datastep didn't appear to include the "vol_g:" variables in the array?  Does the fact that there was a character variable among the numeric variables (by variable order in dataset) stop the 'search' for additional numeric variables that have the "vol_" prefix?

Trusted Advisor
Posts: 1,301

Population of Array with Variable List Wildcard : -- Unexpected Results

In your second example when using vol_: you assign share[1] to vol_i_0000.

Walk through your loop:

DATA share_two;                                                                                                  

        SET &cleaned_data(KEEP=store_id year quarter vol_g: vol_i: );  

        ARRAY share vol_: ;                                                                  

        DO OVER share;                                                                                       

                IF vol_i_0000 NE 0 THEN DO;                                                                  

                share = (share / vol_i_0000);                                                                

                END;                                                                                         

                                                                                                             

                ELSE IF vol_i_0000 = 0 THEN DO;                                                              

                share = 0;                                                                                   

                END;                                                                                         

        END;                                                                                                 

        RUN;

ARRAY share vol_: ;

1. Determine what vol_: =;

3    vol_I_0000    Num       8 

7    vol_g_bd      Num       8 

6    vol_g_fp      Num       8 

8    vol_g_pl      Num       8 

5    vol_g_zz      Num       8 

vol_: = vol_i_0000 vol_g_zz vol_g_fp vol_g_bd vol_g_pl

2. Determine fully qualified statement

ARRAY share

  • vol_i_0000 vol_g_zz vol_g_fp vol_g_bd vol_g_pl;
  • DO OVER SHARE;          *initiate loop;

    First row of data:

    store_id quarter year vol_i_0000 vol_g_zz vol_g_fp vol_g_bd vol_g_pl

    001500 Q109 2009 524 3 446 75 0

    array share = 524 3 446 75 0

    IF VOL_I_0000 NE 0;       *VOL_I_0000=524 TRUE;

      share = (share / vol_i_0000); *SHARE[1]=SHARE[1]/vol_i_0000 share[1]=vol_i_0000 share[1]=524/524 =1;

    In the first iteration of your loop you recalculate vol_i_0000 to 1 by dividing it by itself, so all your other variables you divide by 1 instead of 524.

    New Contributor
    Posts: 4

    Population of Array with Variable List Wildcard : -- Unexpected Results

    Ahhhh!   Thanks so much for the helpful explanation.  So order does matter in the array members but the order doesn't explicitly change what "vol_:" does.  It was the operation order that I was not thinking through.

    Thanks so much!   And this points to Reeza's helpful comment (i can't find it now?) about it probably not being a wise idea to include one of my arithmetic terms in the array itself.

    Super User
    Posts: 19,822

    Re: Population of Array with Variable List Wildcard : -- Unexpected Results

    Are the order of the variables in the file specified as above the same as the ones in your dataset when you run it?

    Ie, where's the vol i variable in the order of things.

    Super User
    Super User
    Posts: 7,060

    Re: Population of Array with Variable List Wildcard : -- Unexpected Results

    Actually looking at what your code is doing you should probably not be including the denominator variable in the array. 

    Then it would not matter if it was before, after or in the middle of the list of numerator variables in the data vector.

    data share_three;                                                                                                  

      set &cleaned_data(keep=store_id year quarter vol_g: vol_i: );  

      array share vol_g: ;                                                                  

      do over share;                                                                                       

        if vol_i_0000 = 0 then share=0;

        else share = share / vol_i_0000 ;

      end;

      * Un comment next line if you want to force VOL_I_0000 to 1 as your previous code did ;

      * if vol_i_0000 ne 0 then vol_i_0000=1;

    run;

    🔒 This topic is solved and locked.

    Need further help from the community? Please ask a new question.

    Discussion stats
    • 10 replies
    • 320 views
    • 3 likes
    • 5 in conversation