BookmarkSubscribeRSS Feed
DanielCoben93
Calcite | Level 5
I am currently working with a dataset that looks somewhat like this, albeit with around 210 variables and 25031 rows in total.
DATA PUBLIC.BCUSTOMERS;
INPUT CUSTNO $ B1STAT JOB_ID;
CARDS;
C19291203 002103 0010
C19291204 000091 0001
C19291205 000000 0101
;
RUN;
As I have lost the original CSV file for this table I decided to export this dataset as a CSV file for backup purposes, but whenever I do this, columns such as "B1STAT" and "JOB_ID" that have 0s in their first digits tend to be exported with a few digits missing (e.g.B1STAT and JOB_ID for CUSTNO C19291204 is exported as 91 and 1). My original idea was to first convert such variables to character variables and add a random letter to the front of the row using CATX() (e.g.B1STAT and JOB_ID for CUSTNO C19291204 would become A000091 and B0001) and export the data like that, after which I would use SUBSTR() to delete the letter from the column. Problem is, there's at least 60 + variables like this and repeating the process for all such variables manually seems rather cumbersome, and I am still unfamiliar with macros at the moment.
 
Is there a way to ensure that the table is exported correctly in a simpler manner instead of doing what I described above?
4 REPLIES 4
ChrisNZ
Tourmaline | Level 20

Variables B1STAT JOB_ID are numbers in your example. 

So they don't have meaningful leading numbers. So exporting them as 91 and 1 is sensible.

You need to either create a character variable using the Z., or write out the numeric value using the Z. format.

andreas_lds
Jade | Level 19

Bad example: the variable b1stat and job_id don't have leading zeros in the dataset. If you want leading zeros you have to attach the appropriate z-format or change the variable to be alphanumeric. Also note that csv-files must not be opened with Excel, as that tool "optimises" the data it reads without further noticing what was changed.

Kurt_Bremser
Super User

See this:

DATA BCUSTOMERS;
INPUT CUSTNO $ B1STAT JOB_ID;
CARDS;
C19291203 002103 0010
C19291204 000091 0001
C19291205 000000 0101
;

proc print noobs;
run;

Result:

    CUSTNO     B1STAT    JOB_ID

   C1929120     2103        10 
   C1929120       91         1 
   C1929120        0       101 

no leading zeroes there.

Now see this:

DATA BCUSTOMERS;
INPUT CUSTNO $ B1STAT :$6. JOB_ID :$4.;
CARDS;
C19291203 002103 0010
C19291204 000091 0001
C19291205 000000 0101
;

proc print noobs;
run;

Result:

  CUSTNO     B1STAT    JOB_ID

 C1929120    002103     0010 
 C1929120    000091     0001 
 C1929120    000000     0101 

Codes should always be stored as character.

 

And yes, never inspect CSV files with Excel. Use a text editor (e.g. Notepad++) for this.

And be very careful with importing strings consisting of numbers solely into Excel.

Astounding
PROC Star

As it stands now, are the variables character or numeric?  If they are numeric, they don't contain leading zeros.  In fact, they don't contain any digits at all.  SAS stores the value (not a set of characters) for numeric variables.  If the variables are character but happen to contain only digits, you should be able to use a somewhat less cumbersome process.  At least I think this should work.

 

Add an observation at the top of the data set.  Assign any variable that requires leading zeros a value of 'ABC' in that extra observation.  Then when you export, SAS should figure out that these are character variables and preserve leading zeros.  After the export, remove the extra row.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 784 views
  • 1 like
  • 5 in conversation