DATA Step, Macro, Functions and more

Character variable in scientific notation?

Reply
Regular Contributor
Posts: 199

Character variable in scientific notation?

I am merging 2 SAS datasets by subject ID. The subject ID (FB_ID in attached screen shot) is stored as a character variable w/ length 20 in both datasets. There are roughly 3,900 observations in the merged dataset. Approximately 300 of the subject IDs appear in the dataset in scientific notation, despite the fact that the variable is a character varaible. Each observation associated w/ a subject ID appearing in scientific notation has missing values for the variables in one of the datasets. 

 

It is possible that the subject IDs appearing in scientific notation are exclusive to one of the datasets. This would explain why the values of the other variables would be missing. However, I can't verify this because I can't convert the scientifc notation, even when using a character format (FORMAT FB_ID $char20.).

 

Also, I have no idea why they appear in scientific notation.

 

When looking at the individual datasets (.csv), prior to merging, neither one of them have any subject IDs store in scientific notation. This notation is only found in the merged dataset.

 

 

Does this make sense to anyone? Thanks for your help.

 


Scientific notation.jpg
Super User
Posts: 11,343

Re: Character variable in scientific notation?

I would back track the data in processing sequence to see where these first appear.

data test;

  set dsname;

   where index(FB_id, 'E') > 0;

run;

should find these. Try different data sets in the processing sequence to find the culprit dataset. The find out how that set was made.

 

That will likely give you the best place to address the issue.

 

 

Regular Contributor
Posts: 199

Re: Character variable in scientific notation?

@ballardw Thanks for your suggestion. It is helpful.

 

Using your code, I found that the values in scientific notation appear in one of the sorted datasets (i.e. "challenge_sort") , but not in the unsorted dataset (i.e. "challenge"). Again, this makes no sense to me. Why would it matter whether the data were sorted by this variable for not?

 

This is the syntax that I am using to sort the datasets prior to the merge. This is just for the one dataset containing the scientific notation.

 

PROC SORT DATA=challenge OUT=challenge_sort;
BY FB_ID;
RUN;
Super User
Posts: 11,343

Re: Character variable in scientific notation?


jcorroon wrote:

 

@ballardw Thanks for your suggestion. It is helpful.

 

Using your code, I found that the values in scientific notation appear in one of the sorted datasets (i.e. "challenge_sort") , but not in the unsorted dataset (i.e. "challenge"). Again, this makes no sense to me. Why would it matter whether the data were sorted by this variable for not?

 

This is the syntax that I am using to sort the datasets prior to the merge. This is just for the one dataset containing the scientific notation.

 

PROC SORT DATA=challenge OUT=challenge_sort;
BY FB_ID;
RUN;

I might suspect that you may have had something overwrite the simple sort you show from your description. I would look for a bit of

code similar to:

Data challenge_sort;

   set challenge_sort;

<other code>;

 

and see if there were any rename and assignments playing with the ID variable (create character from numeric).

 

Something else might be to compare the system dates on the input and output sets. It may be that Challenge was remade at a later date than challenge_sort and that later version now had the Id as a numeric.

 

A good flow chart of the intended order of operations tagged to the program files might be in order.

Super User
Posts: 7,762

Re: Character variable in scientific notation?

Take a thorough look at the log of the whole process, and scan for NOTEs about automatic conversion from character to numeric and vice versa. You might have introduced one or more by accident.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Trusted Advisor
Posts: 1,117

Re: Character variable in scientific notation?

@jcorroon: Is it possible that Excel was involved at any point in the process? I'm asking because '1.00005E+14' (shown in your screenshot) is not typical for SAS ('1.0000457E14'), but it is the default format in which Excel (2013) displays a number like 100004567890123.

 

Your screenshot also shows a permanent informat $20. This suggests that raw data have been imported in an automated way (PROC IMPORT?), which is always a great opportunity to get inconsistent variable types.

 

Why does FB_ID have a length of 20 at all? The values shown in your screenshot have only 15 digits. If there were longer values, it would even be risky to store them as numeric values (in SAS, let alone in Excel) because of limitations in numerical accuracy.

Ask a Question
Discussion stats
  • 5 replies
  • 664 views
  • 1 like
  • 4 in conversation