BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Cruise
Ammonite | Level 13

I'd like to create a unique id combining two variables, year and id.

all I want is 20104961387856, 2015496138, 201149613878 in numeric format as a new variable. The demo below doesn't demonstrate the problem. It creates character variable "uid". However, using my actual large dataset (N=1,5M) uid is created as numeric variable and also found in sci-notation when original variable gets more than 8 digits as shown in the image. Consequent proc freq doesn't work with uid in sci-notification. I'm surprised. I thought sci-not is just for display view. Any suggestions please? Again, I need new variable "uid" as 20104961387856, 2015496138, 201149613878 in numeric format. 

 

data have; 
input year id;
datalines;
2010 4961387856
2015 496138
2011 49613878
;

DATA have1; SET have;
uid=cats(year,id);
run; 

UID created from actual data

 

sas.png

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

That's not really what I meant.

 

That affects the assignment of values to UID and not how the result is displayed.

 

How about

 

DATA have1; SET have;
uid=cats(year,id)+0;
format uid 20.0;
run; 
--
Paige Miller

View solution in original post

7 REPLIES 7
PaigeMiller
Diamond | Level 26

If I understand you properly, the problem where the UID shows E12 is just formatting; what happens if you assign format 20.0 to the UID?

--
Paige Miller
Cruise
Ammonite | Level 13

@PaigeMiller

 

I just tried: uid=input(cats(year,id_bf),20.0);

and it had no effect on "uid". I still can see e12 and e13.

PaigeMiller
Diamond | Level 26

That's not really what I meant.

 

That affects the assignment of values to UID and not how the result is displayed.

 

How about

 

DATA have1; SET have;
uid=cats(year,id)+0;
format uid 20.0;
run; 
--
Paige Miller
Cruise
Ammonite | Level 13
thanks for a solution. do you know why sci-notation affected my further programming? isn't it supposed to be for a display purpose only?
Quentin
Super User

Yes, formats are for display purposes.  When you run PROC freq, it uses a format to display the result.  So yes, formats effect the output of PROC freq.

 

Compare the output from these two FREQ steps:

data have;
  do id=20104961387856, 2015496138, 201149613878;
    output;
  end;
run;

proc freq data=have;
 tables id;
run;

proc freq data=have;
 tables id;
 format id 20.;
run;

 

When you have a long numeric value like this, you will want to make sure you won't exceed the limit of SAS's numeric precision on your OS.  On Windows and unix, the maximum integer SAS can store precisely is 9,007,199,254,740,992.  See http://documentation.sas.com/?docsetId=lrcon&docsetTarget=p0ji1unv6thm0dn1gp4t01a1u0g6.htm&docsetVer...

 

Often it makes to store such values in character variables, to avoid precision issues.

The Boston Area SAS Users Group (BASUG) is hosting our in person SAS Blowout on Oct 18!
This full-day event in Cambridge, Mass features four presenters from SAS, presenting on a range of SAS 9 programming topics. Pre-registration by Oct 15 is required.
Full details and registration info at https://www.basug.org/events.
PaigeMiller
Diamond | Level 26

Why do you need these UID variables to be numeric, anyway? It seems like you are just adding work for yourself. It's not like you are going to try to find that average of UID, that doesn't make sense.

--
Paige Miller
mkeintz
PROC Star

A minor quibble: 9,007,199,254,740,992 (call it X) is the maximum consecutive integer SAS can precisely store on windows and unix:

 

In the "rarely needed but worth knowing" category, SAS can also exactly store these integers, all larger than X:

  1. All even integers X+2 through 2X  (other integers are rounded up or down)
  2. then all the 0mod4 (integers exactly divisible by 4) up through 4X
  3. then all the 0mod8 integers up through 8X
  4. etc.

 

 

Demonstration:

 

data _null_;

  x=constant ('exactint');

  put x=comma21.0;

  do delta=0 to 8;

    x2=x+delta;

    put delta= x2=comma21.0;

  end;

run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 2272 views
  • 2 likes
  • 4 in conversation