SAS Programming

stataq · Posted 10-24-2023 09:31 AM

hello,

I tried to clean my data using following codes. As you can see my data only has 6 cols. but for some reason it change to 7 cols after I run it. Could anyone tell me why?

Is it a way to check my array _test for details? and how?

The extra col i got is with colname=i, value=7. 🤣

Thanks.

ballardw · Posted 10-24-2023 10:22 AM

@stataq wrote:

Isn't i is only used in looping? why I need to have it in my data?

No "i" is not only used in looping. Almost any variable name, other than reserved words, can be used for the name of a variable used for a loop index. The convention to use i, j, k as common "loop" variables comes from ancient code practices. FORTRAN, which has limited variable types, by default uses those names as Integer values. Otherwise you had to declare variables for use as integer with extra code before using in a role where integers were required, such as loop index values. Common in mathematics to use i, j, k as integer index values and early coders were usually from the math world.

Hint: Proc Contents will tell you the names of all the variables in your data set. So if you ran that it would show that the varible I had been added.

By default ANY variable that you reference anywhere in your code will end up in the output data set unless you explicitly DROP it (or provide a KEEP list).

A very common source of "added variables" is misspelling. If your data set has a variable named Group and you misspell the name in a statement, such as:

If grop=123 then <do something> ;

you have "added" a variable named "grop" to the data set. Typically this error will result in a "variable <name> has not been initialized" note as there are no values assigned.

View solution in original post

PaigeMiller · Posted 10-24-2023 09:34 AM

@stataq wrote:

hello,

I tried to clean my data using following codes. As you can see my data only has 6 cols. but for some reason it change to 7 cols after I run it. Could anyone tell me why?

Is it a way to check my array _test for details? and how?

The very simple solution to the question about why there are 7 columns is for you to LOOK AT data set TEST with your own eyes, and you should see what has happened.

Then you say "Is it a way to check my array _test for details? and how?", but I don't really understand what this means or what "details" you are referring to or what you are need to "check".

Suggestion: do not use code like this, which overwrites your original data set

data test;
    set test;

Instead, use code like this, which does not overwrite your original data set

data test1;
    set test;

--
Paige Miller

stataq · Posted 10-24-2023 10:00 AM

Thanks so much for the explaining. I want to make sure my array was setup correctly. Basically I want to remove any space from my outputs. I tried to loop clean my data. I checked my dim(_test) which is 6, but for some reason it will add 7th col to my data with name i and value 7.

Is it a way to output my data as example data? I wonder whether is my data problem but I don't know how to output it as example data.

PaigeMiller · Posted 10-24-2023 10:05 AM

@stataq wrote:

Thanks so much for the explaining. I want to make sure my array was setup correctly. Basically I want to remove any space from my outputs. I tried to loop clean my data.

Your code seems to do this correctly. But you should check, you shouldn't have to ask us if it is doing it properly.

I checked my dim(_test) which is 6, but for some reason it will add 7th col to my data with name i and value 7.

Whenever you create a variable in a DATA step, the variable is added to the SAS data set that gets created. You can of course drop this variable if you want.

Is it a way to output my data as example data? I wonder whether is my data problem but I don't know how to output it as example data.

I'm not sure I understand what you want to do here.

--
Paige Miller

Quentin · Posted 10-24-2023 10:38 AM

One way to check your data, and the logic of your DATA step, is to add PUT statements to write messages to the log.

Here's a step like yours, with PUT statements added:

data shoes ;
  set sashelp.shoes (obs=3);
  array chars {*} _character_ ;
  put "before loop " _n_= (_character_)(=) /;
  do i=1 to dim(chars) ;
    put "before compress " i= chars{i}= ;
    chars{i}=compress(chars{i}) ;
    put "after compress "  i= chars{i}= /;
  end ;
  put "after loop " _n_= i= (_character_)(=) ///;
  drop i ;
run ;

The log is:

1    data shoes ;
2      set sashelp.shoes (obs=3);
3      array chars {*} _character_ ;
4      put "before loop " _n_= (_character_)(=) /;
5      do i=1 to dim(chars) ;
6        put "before compress " i= chars{i}= ;
7        chars{i}=compress(chars{i}) ;
8        put "after compress "  i= chars{i}= /;
9      end ;
10     put "after loop " _n_= i= (_character_)(=) ///;
11     drop i ;
12   run ;

before loop _N_=1 Region=Africa Product=Boot Subsidiary=Addis Ababa

before compress i=1 Region=Africa
after compress i=1 Region=Africa

before compress i=2 Product=Boot
after compress i=2 Product=Boot

before compress i=3 Subsidiary=Addis Ababa
after compress i=3 Subsidiary=AddisAbaba

after loop _N_=1 i=4 Region=Africa Product=Boot Subsidiary=AddisAbaba



before loop _N_=2 Region=Africa Product=Men's Casual Subsidiary=Addis Ababa

before compress i=1 Region=Africa
after compress i=1 Region=Africa

before compress i=2 Product=Men's Casual
after compress i=2 Product=Men'sCasual

before compress i=3 Subsidiary=Addis Ababa
after compress i=3 Subsidiary=AddisAbaba

after loop _N_=2 i=4 Region=Africa Product=Men'sCasual Subsidiary=AddisAbaba



before loop _N_=3 Region=Africa Product=Men's Dress Subsidiary=Addis Ababa

before compress i=1 Region=Africa
after compress i=1 Region=Africa

before compress i=2 Product=Men's Dress
after compress i=2 Product=Men'sDress

before compress i=3 Subsidiary=Addis Ababa
after compress i=3 Subsidiary=AddisAbaba

after loop _N_=3 i=4 Region=Africa Product=Men'sDress Subsidiary=AddisAbaba
NOTE: There were 3 observations read from the data set SASHELP.SHOES.
NOTE: The data set WORK.SHOES has 3 observations and 7 variables.

There are three character variables in sashelp.shoes, so dim(chars) = 3. Note that the DO loop iterates 3 times for each record. The final value of the iterator variable i is 4, because during the third iteration i=3, then at the bottom of the loop i is incremented by 1, then the loop does not iterate again.

The Boston Area SAS Users Group is hosting free webinars!
Next up: Troy Martin Hughes presents Calling Open-Source Python Functions within SAS PROC FCMP: A Google Maps API Geocoding Adventure on Wednesday April 23.
Register now at https://www.basug.org/events.

stataq · Posted 10-24-2023 10:49 AM

this is very helpful. Thanks so much.👍

Tom · Posted 10-24-2023 09:42 AM

Apparently your existing dataset did not already have a variable named I. So your DO loop added it.

stataq · Posted 10-24-2023 09:48 AM

Isn't i is only used in looping? why I need to have it in my data?

ballardw · Posted 10-24-2023 10:22 AM

@stataq wrote:

Isn't i is only used in looping? why I need to have it in my data?

No "i" is not only used in looping. Almost any variable name, other than reserved words, can be used for the name of a variable used for a loop index. The convention to use i, j, k as common "loop" variables comes from ancient code practices. FORTRAN, which has limited variable types, by default uses those names as Integer values. Otherwise you had to declare variables for use as integer with extra code before using in a role where integers were required, such as loop index values. Common in mathematics to use i, j, k as integer index values and early coders were usually from the math world.

Hint: Proc Contents will tell you the names of all the variables in your data set. So if you ran that it would show that the varible I had been added.

By default ANY variable that you reference anywhere in your code will end up in the output data set unless you explicitly DROP it (or provide a KEEP list).

A very common source of "added variables" is misspelling. If your data set has a variable named Group and you misspell the name in a statement, such as:

If grop=123 then <do something> ;

you have "added" a variable named "grop" to the data set. Typically this error will result in a "variable <name> has not been initialized" note as there are no values assigned.

stataq · Posted 10-24-2023 10:43 AM

I can manually drop 'i' variable. Is it a way to prevent this to happen. I applied the same code on another similar dataset and it did not add 'i'. Is this 'i' randomly added to data set? It should not but I could not figure out why this happen.

PaigeMiller · Posted 10-24-2023 10:49 AM

It's not random. Any time you create a variable, such as the variable named I in the DO loop, it is added to the data set. If it was not added in a different example, that is because the variable I already existed in that other data set in the other example.

--
Paige Miller

Tom · Posted 10-24-2023 11:10 AM

@stataq wrote:

I can manually drop 'i' variable. Is it a way to prevent this to happen. I applied the same code on another similar dataset and it did not add 'i'. Is this 'i' randomly added to data set? It should not but I could not figure out why this happen.

There is a way to loop over an array without having to specify the index variable. The DO OVER statement. Let's do any example using SASHELP.CLASS and take out all of the M characters (since none of those values have embedded spaces to be removed).

data want;
  set sashelp.class;
  array charvar _character_;
  do over charvar;
    charvar=compress(charvar,'M');
  end;
run;

NOTE: The data set WORK.WANT has 19 observations and 5 variables

Note it still does create a variable. In this case it creates a variable named _I_. But it also marks the variable to be dropped. Which can cause the opposite problem you had, writing out fewer variables than it read in, if the index variable used for the implicit array reference was one that already exited. You can fix it with a KEEP statement.

That was how ARRAYs originally worked. SAS has decided to no longer document that syntax. But as of now it still works and is a convenient way to process an array like yours where there is no meaning to the index value.

PaigeMiller · Posted 10-24-2023 11:24 AM

@Tom wrote:
That was how ARRAYs originally worked. SAS has decided to no longer document that syntax. But as of now it still works and is a convenient way to process an array like yours where there is no meaning to the index value.

But DO OVER may not work in future releases of SAS, maybe even the next release, we don't know. So any program that isn't a "one time only" program that uses DO OVER could run into problems in the future. And you can't even say that "as of now it still works" as we don't know all the possible ways that DO OVER may be used and so right now maybe it works for 99% and fails on the 1%. So in my opinion, using DO OVER is not a good habit to get into, especially since the only benefit of using DO OVER is that it eliminates the need to write drop i; in the data step.

--
Paige Miller

SAS Programming

how to check array setup

Re: how to check array setup

Re: how to check array setup

Re: how to check array setup

Re: how to check array setup

Re: how to check array setup

Re: how to check array setup

Re: how to check array setup

Re: how to check array setup

Re: how to check array setup

Re: how to check array setup

Re: how to check array setup

Re: how to check array setup

Re: how to check array setup

Follow Us

What is...

SAS Programming

Special offer for SAS Communities members

SAS Training: Just a Click Away

Follow Us

What is...