BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
archita
Fluorite | Level 6
proc univariate data=blib.shoes_tracker;
var product_id;
run;

archita_0-1650714618193.png

I am trying to find out how many values are too small or too large for variable product_id. The values should be exactly 12 digits. I want to understand is the answer 3  for smallest values and 4 for largest values? Please help. I am bit confused.

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

@archita wrote:
data out;
set blib.shoes_tracker;
format product_id $12.;
run;
proc univariate data=out;
var product_id;
run;

Hi, thanks for replying , I modified the code but still not getting the results. Please let me know here I am wrong?


Since product_id must be a numeric variable (otherwise the PROC UNIVARIATE step in your initial post would not have produced a result), SAS interprets your FORMAT statement as

format product_id 12.;

which doesn't impact the results from PROC UNIVARIATE. (A warning "Variable product_id has already been defined as numeric." should have appeared in the log of the DATA step.)

 

The type of an existing variable cannot change anyway. Even if you modified your input dataset blib.shoes_tracker so that product_id is now a character variable, assigning the $12. format would be counterproductive: The longer values would be displayed truncated to 12 characters (e.g., in PROC PRINT or PROC FREQ output), making it impossible to recognize their actual length (13).

 

Luckily, you did get the desired results already from the table you've shown in your initial post: Four product_id values (displayed as 2.20200E+10) have only 11 integer digits, i.e., are "too short," and eight values (displayed as 2.2020E+12) have 13 digits, i.e., are "too long."

View solution in original post

10 REPLIES 10
FreelanceReinh
Jade | Level 19

Hello @archita,

 

The results in the table indicate that four values have only 11 integer digits (decimals not counted!) and eight values have 13.

 

If you store product IDs as character variables, which is generally recommended, you will no longer be confused by scientific notation ("...E+10" etc.).

archita
Fluorite | Level 6
data out;
set blib.shoes_tracker;
format product_id $12.;
run;
proc univariate data=out;
var product_id;
run;

Hi, thanks for replying , I modified the code but still not getting the results. Please let me know here I am wrong?

PaigeMiller
Diamond | Level 26

Assigning a $12. format won't work if the variable is numeric.

 

So why don't you try PROC FREQ?

--
Paige Miller
archita
Fluorite | Level 6

Thank you for the reply . I need to validate product_id variable using univariate procedure alone. The values which should be exactly 12 digits , to check how many values of product_id are too small or too large? So I am using this but getting sam e results

 

PaigeMiller
Diamond | Level 26

@archita wrote:

Thank you for the reply . I need to validate product_id variable using univariate procedure alone.


This doesn't make sense to require PROC UNIVARIATE.

 

The values which should be exactly 12 digits , to check how many values of product_id are too small or too large?

As pointed out by @FreelanceReinh some of your data is 13 digits, and some is 11 digits. As pointed out by @FreelanceReinh and myself, forcing these values to be numeric makes your life difficult, they should be character variables, and then PROC FREQ gives the answer easily.

 

--
Paige Miller
archita
Fluorite | Level 6

Thanks for the information. But the variable is numeric only.

PaigeMiller
Diamond | Level 26

@archita wrote:

Thanks for the information. But the variable is numeric only.


Just because a variable is provided in an inappropriate way (it should be text), that does not mean you have to work with it as numeric, you can create a character variable and work with it that way. But anyway, as I said above, PROC FREQ will do what you want on numeric variables.

 

Also, with character variables, you can use the LENGTH function to determine if any observations have 11 or 13 characters.

 

Or if you insist on doing thing with a numeric variable, you can test to see if the number is less than the smallest 12 digit integer; or greater than the largest 12 digit integer.

--
Paige Miller
FreelanceReinh
Jade | Level 19

@archita wrote:
data out;
set blib.shoes_tracker;
format product_id $12.;
run;
proc univariate data=out;
var product_id;
run;

Hi, thanks for replying , I modified the code but still not getting the results. Please let me know here I am wrong?


Since product_id must be a numeric variable (otherwise the PROC UNIVARIATE step in your initial post would not have produced a result), SAS interprets your FORMAT statement as

format product_id 12.;

which doesn't impact the results from PROC UNIVARIATE. (A warning "Variable product_id has already been defined as numeric." should have appeared in the log of the DATA step.)

 

The type of an existing variable cannot change anyway. Even if you modified your input dataset blib.shoes_tracker so that product_id is now a character variable, assigning the $12. format would be counterproductive: The longer values would be displayed truncated to 12 characters (e.g., in PROC PRINT or PROC FREQ output), making it impossible to recognize their actual length (13).

 

Luckily, you did get the desired results already from the table you've shown in your initial post: Four product_id values (displayed as 2.20200E+10) have only 11 integer digits, i.e., are "too short," and eight values (displayed as 2.2020E+12) have 13 digits, i.e., are "too long."

archita
Fluorite | Level 6

Thank you so much , got your point.

PaigeMiller
Diamond | Level 26

By treating PRODUCT_ID as numeric, and then using PROC UNIVARIATE, it cannot display all of the digits of the PRODUCT_ID. Because you have PRODUCT_ID as numeric, then the values are displayed in scientific notation, which would normally be fine for numeric variables.

 

But PRODUCT_ID should not be numeric. It should be a character string. And then PROC FREQ can be used to answer your question, and no scientific notation will appear in the output. (Actually PROC FREQ will do what you want on numeric values, but honestly, you shouldn't be making PRODUCT_ID numeric, that's just a bad thing to do)

--
Paige Miller

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 10 replies
  • 1034 views
  • 0 likes
  • 3 in conversation