BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Krysia24
Obsidian | Level 7

Hello!

 

Apologies for this basic question, as you can see I am a learner. I have a variable and I need all the responses to have an even amount of characters in them. For example, currently it looks like this: 

105

3

11

3105

20306

 

But what I want is to apply a format that adds a leading zero **only to the observations within the variabel that have an odd number of digits**. So what I want would look like this: 

0105

03

11

3105 

020306

 

As you can see, in my ideal world, the format is applied only to those variables that I have flagged as having an odd number of digits. I already have a separate variable that I have created that is binary, where 1 = odd number of digits. Using this, how can I do some sort of conditional formatting to only apply the format to these responses that have an odd number of characters? It's important to know that I need this retained as one variable in the end. Knowing if this is not possible is also useful for me to know. 

 

Thanks so much! 

1 ACCEPTED SOLUTION

Accepted Solutions
Cynthia_sas
SAS Super FREQ

Hi,

  And, as a comparison, here is how you would make new character variables in a program to make the new variables in a dataset.

cynthia

create_new_character_variables.png

 

View solution in original post

10 REPLIES 10
Cynthia_sas
SAS Super FREQ

Hi:
Is this question related to your previous question? https://communities.sas.com/t5/SAS-Analytics-U/Counting-number-of-characters-in-a-string-for-each-re... ?

  

The method you use to achieve your goal will depend on whether your desired output is in a report or in a dataset. With PROC REPORT, you can apply a format conditionally to only certain report rows using a CALL DEFINE statement for changing format in a COMPUTE block.

 

For example, if you want to have a REPORT, you can do this:conditional_formatting_report.png

 

The challenge will be if you want your conditional formatting in a dataset because a numeric variable can only have 1 format assigned in the descriptor portion of the SAS dataset -- so a numeric variable that used the Z5. format would have that width of 5 used for EVERY numeric value on EVERY row in the dataset. However, a character variable that is the character version of a number could have 2 leading zeroes on one row and no leading zeroes on another row. You could control the creation of the character variable with conditional logic.

 

So, as you see, the question you're asking needs to be clarified as to whether you want a report or a dataset with the differing number of leading zeroes based on the number of characters.

 

cynthia

Krysia24
Obsidian | Level 7

Hi Cynthia!

 

Thanks so much for pointing out that this needs additional clarification. 

 

I would need to do the conditional formatting in the dataset - since this is only the data management portion of my work before I get into analysis of the data. 

 

So it sounds like step one would be to convert my variable {the variable that needs the leading zero if odd number of characters} to a character string? And then apply some conditional logic? 

Cynthia_sas
SAS Super FREQ
The problem I'm having is that I'm not clear on what type of analysis there is that matters whether your numeric variable is formatted with leading zeroes or not. Astounding has provided you an example of a PROC FORMAT solution. but you need to understand that PROC FORMAT is only impacting the external display of the number. The "internal" number will NOT have leading zeroes for the purpose of the analysis. So although I now understand that you want this in the dataset, I'm not sure WHY you want this in a dataset because short of creating a character version of your number, for numeric variables, the "internally stored" number, without leading zeroes will be used for analysis (depending on your analysis).

cynthia
Krysia24
Obsidian | Level 7

I'll explain why I think this matters - sorry for leaving out key background.

 

So right now I am dealing with a dataset from a questionnaire and there is one variable that was a "select all that apply" variable. For this variable, let's call it Q1, there are twelve possible responses - and if someone checked off multiple, it all jumbled into one variable. 

So the potential responses for Q1 ranged from 1-12, but if someone checked off multiple reasons, you get a response like "1020812." 

 

My initial reaction was to just separate out responses with SUBSTRN and then combine responses of each into one variable - so if anyone had responded 1 for any of the separations, it would be combined into one variable. However, this approach neglected the fact that the first character in the string did not have a leading zero if it was a single digit response, while subsequent responses did have a leading zero if it was a single digit response. For example, in the above response - 1020812, it separates into 01, 02, 08, 12 because my code says to select the first digit of the string as a single variable and then to select by two's. Which is correct. However, if someone responded, 12, then it would incorrectly break into a response of 1 and 2 - which is incorrect. Here's my code if it helps: 

 

DATA example2;
SET example;
newQ1= SUBSTRN(Q1, 1,1);  
newQ1_1 = SUBSTRN(Q1,2,2);
newQ1_2 = SUBSTRN(Q1,4,2);
newQ1_3= SUBSTRN(Q1,6,2);
newQ1_4 = SUBSTRN (Q1,8,2);
newQ1_5 = SUBSTRN (Q1,10,2);
RUN;

 

and then for each of these new variables would do something like this to combine: 

DATA example3;
SET example2;
IF newQ1 =1 or newQ1_1= 01 or newQ1_2= 01 or newQ1_3 = 01 or newQ1_4 = 01 or newQ1_5 = 01 THEN Q1_new = 1; 
... and this this for each of the above 

 

Thus, I came up with the solution of adding a leading zero only to variables that had an odd number of digits, so that the separation between variables was equal. 

Krysia24
Obsidian | Level 7

Assuming it's possible to do what I am saying, the logic is that by adding a leading zero to observations with an odd # of characters, I could then just extract the first two digits of every observation and this issue would be resolved.

Astounding
PROC Star

The first step would be to evaluate whether your data is correct as is, or whether you need to go back and re-create your data set.

 

If a person could select all 12 answers, you would end up with a 23-digit number.  That is more than SAS can accurately store.  The limit is 15 digits (16 on some operating systems, but in your survey 15 would handle all 8-answer responses).  Do you have any sense of whether anyone actually selected 9 or more items on a single question?

Krysia24
Obsidian | Level 7

Luckily, the most a person selected was six responses, and since the first response was a single digit, the longest response was only 11 characters long. 

Astounding
PROC Star

OK, here's what I'm looking at as an approach then.  Assuming that Q1 contains up to 11 digits:

 

proc format;

value odd 0-99=[z2.]

100-9999=[z4.]

10000-999999=[z6.];

1000000-99999999=[z8.]

100000000-9999999999=[z10.]

10000000000-999999999999=[z12.]

run;

 

The way I'm interpreting the problem, Q1 indicates which parts should hold 1 instead of zero.  If that's correct, this program takes the existing Q1, and assigns 1 to some parts and 0 to the remaining parts:

 

data want;

set have;

length temp $ 12;

temp = put(Q1, odd.);

array q1_new {12} q1_new1 - q1_new12;

do i=1 to 12;

   q1_new{i} = 0;

end;

if q1 > 0 then do i=1 to length(temp) by 2;

   change_this_var = input(substr(temp, i), 2.);

   q1_new{change_this_var}=1;

end;

drop temp i change_this_var;

run;

 

It's untested but looks like it should work.

 

******************************************************

 

Edited to allow 12 "new" variables instead of 6 ... each with a value of 0 or 1.

 

Cynthia_sas
SAS Super FREQ

Hi,

  And, as a comparison, here is how you would make new character variables in a program to make the new variables in a dataset.

cynthia

create_new_character_variables.png

 

Astounding
PROC Star

It sounds like your variable is numeric, taking on integer values only.  From the example, it looks like you want left-hand justified strings printing, so here is how you would do that.  (If you want right-hand justified strings instead, only minor changes would be required but the code would be longer.)

 

proc format;

value odd 0-99=[z2.]

100-9999=[z4.]

10000-999999=[z6.];

run;

 

Then apply the ODD format to the variable in question.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 9687 views
  • 1 like
  • 3 in conversation