Desktop productivity for business analysts and programmers

Column with varying data lengths

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 94
Accepted Solution

Column with varying data lengths

I have a .xlsx file that I have to convert to .csv to use in SAS EG.  It is my understanding that SASEG 4.3 does not allow import of .xlsx files.

There is one column in the file with data of varying lengths.  Here is a sample of the column (with is alpha-numeric).  As you can see, there are 7-digit, 8-digit and 9-digit numbers.  The cells with 4-digits are missing the leading 6 zeros.

GOC_Number

2051
2098
1554630
1994104
2510107
10351204
10351235
100026820
100026823

I tried Text-to-Column in the .xlsx file then save it as .csv.  That did not work - it did not save the column as text.

I also tried this code change the variable length to z10.  That added zeros to the cells that are 7,8 or 9 digits lone.

Any thoughts/advice?


Accepted Solutions
Solution
‎01-22-2013 09:06 AM
Super Contributor
Posts: 543

Re: Column with varying data lengths

Smiley Happy

data test;

input goc_number $;

cards;

2051

2098

1554630

1994104

2510107

10351204

10351235

100026820

100026823

;

data temp;

     set test;

     length_goc=length(goc_number);

     if length_goc=4 then do;

        temp_goc = compress('000000' || goc_number);

            format temp_goc $10.;

     end;

     if temp_goc ne "" then final_goc = temp_goc;

        else final_goc = goc_number;

                    format final_goc $10.;

     run;

Is this what you'd like?

View solution in original post


All Replies
Super Contributor
Posts: 543

Re: Column with varying data lengths


Hi.

I am not familair with EG, but if possible, you could calculate the length of your variable:

var_length = length(your_var);

if var_length = 4 then do;

     format your_var z10.;

end;

So this snippet should only add the leading zeroes to the entries that are 4 digits.

Good luck!

Anca.

Frequent Contributor
Posts: 94

Re: Column with varying data lengths

I tried this code but got an error.  Any advice please.

data test;

     set jen1.emp_current

     primary_goc=length(primary_goc);

     if primary_goc=4 then do;

     format primary_goc z10.;

     run;

15         data test;

16              set jen1.emp_current

17              primary_goc=length('primary_goc');

                       _

                       22

                       76

ERROR 22-322: Syntax error, expecting one of the following: a name, a quoted string, (, -, :, ;, END, INDSNAME, KEY, KEYS, NOBS,

              OPEN, POINT, _DATA_, _LAST_, _NULL_. 

ERROR 76-322: Syntax error, statement will be ignored.

Super User
Posts: 3,233

Re: Column with varying data lengths

Missing semicolon in line 16:

data test;

set jen1.emp_current;

primary_goc=length('primary_goc');

Super Contributor
Posts: 543

Re: Column with varying data lengths

Hi Jen,

In this line of code:

primary_goc = length('primary_goc');

you are replacing the value of primary_goc with its length.

I think you should define a new variable

l_primary_goc = length(primary_goc); if l_primary_goc = 4 then....


Just a thought.

Regards,

Anca.

Frequent Contributor
Posts: 94

Re: Column with varying data lengths

I fixed the code and added the semi-colon.

data test;

     set jen1.emp_current;

     primary_goc=length(primary_goc);

     if primary_goc=4 then do;

     format primary_goc z10.;

     run;

I got the following error:

24              if primary_goc=4 then do;

25              format primary_goc z10.;

                               ____

                               484

NOTE 484-185: Format $Z was not found or could not be loaded.

26              run;

26              run;

               _

               117

ERROR 117-185: There was 1 unclosed DO block.

I changed z10. to $10. and that seemed to fix the issue.  However, I still got the same error on line 26.

           

Super Contributor
Posts: 543

Re: Column with varying data lengths

After your format statement you need an end;

data test;

     set jen1.emp_current;

     primary_goc=length(primary_goc);

     if primary_goc=4 then do;

     format primary_goc z10.;

     end;

     run;

 

Thanks.

Anca.

Frequent Contributor
Posts: 94

Re: Column with varying data lengths

Hi Anca,

I ran this code:

data test;

     set jen1.emp_current;

     a_primary_goc=length(primary_goc);

     if a_primary_goc=4 then do;

     format primary_goc $10.;

     end;

     run;

Thoughts?

The results:

a_primary_goc column is = to 4 for all rows

primary_goc still in 4-digits.

Solution
‎01-22-2013 09:06 AM
Super Contributor
Posts: 543

Re: Column with varying data lengths

Smiley Happy

data test;

input goc_number $;

cards;

2051

2098

1554630

1994104

2510107

10351204

10351235

100026820

100026823

;

data temp;

     set test;

     length_goc=length(goc_number);

     if length_goc=4 then do;

        temp_goc = compress('000000' || goc_number);

            format temp_goc $10.;

     end;

     if temp_goc ne "" then final_goc = temp_goc;

        else final_goc = goc_number;

                    format final_goc $10.;

     run;

Is this what you'd like?

Super User
Super User
Posts: 6,842

Re: Column with varying data lengths

Looking at the various error messages you have posted your variable PRIMARY_GOC is a character variable and not a numeric variable.

Also you seem to be trying to treat the FORMAT statement as if it was an executable statement.  The FORMAT statements are compiled so putting them inside of an IF/THEN conditional will have no effect.

So do you want the character strings in PRIMARY_GOC that look like they might be numbers to be left padded with zeros?

primary_goc = put( input(primary_goc,10.),z10.);

Frequent Contributor
Posts: 94

Re: Column with varying data lengths

Tom - You are correct.  It's a character variable since it's alpha-numeric.

As I stated in the original post, the issue is that in this column there are cells of different lengths.  Some are 7, 8, 9 and 10 digits long.  The ones that are 4-digits long should be 10 digits.

If I used as you suggested, it would zeros in front of the 7,8 and 9 digit cells, which I do not want.

That data is what it is...

Super User
Super User
Posts: 6,842

Re: Column with varying data lengths

If the issue is that you want all values that are four characters long to be prefixed with six zeros then that answer was posted above.

if length(primary_goc) = 4 then primary_goc='000000'||primary_goc;

Super Contributor
Posts: 356

Re: Column with varying data lengths

Why you trying to determine the length, wouldn't yuo just read the csv in delimited by ',' and define the variable in SAS as numeric 8.? and then apply the z10. format or put as char using z10.

Also if you just save teh file as CSV it should work, or try saving as 2003-2007 xls.

Frequent Contributor
Posts: 94

Re: Column with varying data lengths

I would prefer to work in 2003-2007 .xls, but the file has more than 65K rows. 

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 13 replies
  • 689 views
  • 0 likes
  • 5 in conversation