BookmarkSubscribeRSS Feed
kateb409
Obsidian | Level 7

Hi there,

I am working in SAS studio for SAS On Demand. I am working on a project for class and need help adding in dashes in place of missing values for the variable 'Inits'. Some of the observations have missing middle initials and they are blanks, one example is observation #6 below, and I need those to be dashes. I have attached the original dataset, along with my code to manipulate it, and a screenshot of the results.

Code:

DATA Utah;
retain SSN Inits City StateCd ZipCd;
SET Records (rename = (Inits = Initsold
CitySt = StateCd));
ID1 = put(ID, Z9.);
SSN = catx('-', SUBSTR(PUT(ID1, Z9.), 1, 3), SUBSTR(PUT(ID1, Z9.), 4, 2), SUBSTR(PUT(ID1, Z9.), 6, 4));
ZipCd = put(Zipcode, Z5.);
City = scan(StateCd, 1, ',');
Inits = compress(Initsold,'.');

***note: how would I add in a dash for missing middle initial?;

drop ID ID1 Zipcode Initsold GenderCode EthnicityCode RaceCode BirthYear BirthMonth BirthDay;
RUN;

 

Please let me know if anyone knows what to do! Thank you!

 

Sincerely,

 

Kaitlin E Buck

 

7 REPLIES 7
PeterClemmensen
Tourmaline | Level 20

Only for missing middle initial, correct?

kateb409
Obsidian | Level 7

Hi @PeterClemmensen 

Yes, it is only for the middle initials.

-Kaitlin

ballardw
Super User

I would start with

 

Inits = coalescec ( compress(Initsold,'.'), '-');

 

The coalsecec (and the numeric coalesce) functions examine the values inside the () from left to right and return the leftmost non-missing value. So if compress(initsold,'.') returns missing the coalescec looks at the next value and gets the dash (or any other character value you have there.

 

If your ID variable is numeric you could assign the SSN. format to it so values display  in the 123-45-6789.

Or use

SSN = put(id,ssn.); instead of messing around with all that substr code.

data  example;
   x= 123456789;
   ssn = put(x,ssn.);
   put ssn=;
run;
kateb409
Obsidian | Level 7

Hi @Reeza 

I checked this out and this post is actually about a different part of a project my class is working on. My issue is that for the dataset I have attached is there are missing values (as blanks) for middle initials in some of the observations and they need to be dashes not blanks. The post you referred me to is about keeping dashes as missing middle initials in a different dataset.

-Kaitlin

Tom
Super User Tom
Super User

So first let's look at the code you posted and see what it is doing.

DATA WORK.Contact_UT;
  retain SSN Inits City StateCd ZipCd;
  SET HypImpt.UT_Records (rename=(Inits=Initsold CitySt=StateCd));

So those lines will create a new dataset named CONTACT_UT in the WORK library by reading the data from the exiting UT_RECORDS dataset in the HYPIMPT library.  Two of the variables will have their names changed. Plus five variables will be forced to the beginning of the column order (and if they weren't already on the input dataset the automatic reset to missing at the start of each iteration of the data step will be cancelled.)

ID1 = put(ID, Z9.);

This will convert the numeric variable ID to a new character variable ID1.  Since you haven't yet defined ID1 SAS will guess that you meant to make it character of length 9 because of the width of the format used in the PUT() function call.  If ID is not numeric it will first be converted by SAS to a number, and any values that don't look like numbers will cause a missing value to be used in the PUT() function call.

SSN = catx('-', SUBSTR(PUT(ID1, Z9.), 1, 3), SUBSTR(PUT(ID1, Z9.), 4, 2), SUBSTR(PUT(ID1, Z9.), 6, 4));

This statement does not make any sense. This part, PUT(ID1, Z9.), of that statement does not make any sense.  It will take the value of the new CHARACTER variable ID1 and try to convert to a number (note that SAS uses the BEST12 format to do this automatic conversion. Since the number is at most 9 digits long the first three characters will always be spaces.

ZipCd = put(Zipcode, Z5.);

This is similar to the previous code.  Since ZIPCD is not previously defined SAS will guess that you want to define is as character with a length of 5.

City = scan(StateCd, 1, ',');

This will take the values of STATECD, what used to called CITYCD, up to the first comma and put it into CITY. If CITY does not already exist then SAS will guess that you want it to have the same length of STATECD does.

Inits = compress(Initsold,'.');

This line will remove any periods from INITSOLD, what used to be named INITS, and assign it to INITS.

  drop ID ID1 Zipcode Initsold GenderCode EthnicityCode RaceCode BirthYear BirthMonth BirthDay;
run;

And finally you list which variable not to keep in the target dataset.  Any other variables generated by the data step will be in the dataset.

 

If you just want to convert periods to hyphens you can use the TRANSLATE() function.

Inits = translate(Initsold,'-','.');

For more help post some of the data as TEXT (not attached binary files or photographs).

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 1632 views
  • 3 likes
  • 6 in conversation