Write and run SAS programs in your web browser

Applying a format only to certain observations within a variable

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 13
Accepted Solution

Applying a format only to certain observations within a variable

[ Edited ]

Hello!

 

Apologies for this basic question, as you can see I am a learner. I have a variable and I need all the responses to have an even amount of characters in them. For example, currently it looks like this: 

105

3

11

3105

20306

 

But what I want is to apply a format that adds a leading zero **only to the observations within the variabel that have an odd number of digits**. So what I want would look like this: 

0105

03

11

3105 

020306

 

As you can see, in my ideal world, the format is applied only to those variables that I have flagged as having an odd number of digits. I already have a separate variable that I have created that is binary, where 1 = odd number of digits. Using this, how can I do some sort of conditional formatting to only apply the format to these responses that have an odd number of characters? It's important to know that I need this retained as one variable in the end. Knowing if this is not possible is also useful for me to know. 

 

Thanks so much! 


Accepted Solutions
Solution
‎07-12-2016 01:38 PM
SAS Super FREQ
Posts: 8,719

Re: Applying a format only to certain observations within a variable

Hi,

  And, as a comparison, here is how you would make new character variables in a program to make the new variables in a dataset.

cynthia

create_new_character_variables.png

 

View solution in original post


All Replies
SAS Super FREQ
Posts: 8,719

Re: Applying a format only to certain observations within a variable

Hi:
Is this question related to your previous question? https://communities.sas.com/t5/SAS-Analytics-U/Counting-number-of-characters-in-a-string-for-each-re... ?

  

The method you use to achieve your goal will depend on whether your desired output is in a report or in a dataset. With PROC REPORT, you can apply a format conditionally to only certain report rows using a CALL DEFINE statement for changing format in a COMPUTE block.

 

For example, if you want to have a REPORT, you can do this:conditional_formatting_report.png

 

The challenge will be if you want your conditional formatting in a dataset because a numeric variable can only have 1 format assigned in the descriptor portion of the SAS dataset -- so a numeric variable that used the Z5. format would have that width of 5 used for EVERY numeric value on EVERY row in the dataset. However, a character variable that is the character version of a number could have 2 leading zeroes on one row and no leading zeroes on another row. You could control the creation of the character variable with conditional logic.

 

So, as you see, the question you're asking needs to be clarified as to whether you want a report or a dataset with the differing number of leading zeroes based on the number of characters.

 

cynthia

Occasional Contributor
Posts: 13

Re: Applying a format only to certain observations within a variable

Hi Cynthia!

 

Thanks so much for pointing out that this needs additional clarification. 

 

I would need to do the conditional formatting in the dataset - since this is only the data management portion of my work before I get into analysis of the data. 

 

So it sounds like step one would be to convert my variable {the variable that needs the leading zero if odd number of characters} to a character string? And then apply some conditional logic? 

SAS Super FREQ
Posts: 8,719

Re: Applying a format only to certain observations within a variable

The problem I'm having is that I'm not clear on what type of analysis there is that matters whether your numeric variable is formatted with leading zeroes or not. Astounding has provided you an example of a PROC FORMAT solution. but you need to understand that PROC FORMAT is only impacting the external display of the number. The "internal" number will NOT have leading zeroes for the purpose of the analysis. So although I now understand that you want this in the dataset, I'm not sure WHY you want this in a dataset because short of creating a character version of your number, for numeric variables, the "internally stored" number, without leading zeroes will be used for analysis (depending on your analysis).

cynthia
Occasional Contributor
Posts: 13

Re: Applying a format only to certain observations within a variable

I'll explain why I think this matters - sorry for leaving out key background.

 

So right now I am dealing with a dataset from a questionnaire and there is one variable that was a "select all that apply" variable. For this variable, let's call it Q1, there are twelve possible responses - and if someone checked off multiple, it all jumbled into one variable. 

So the potential responses for Q1 ranged from 1-12, but if someone checked off multiple reasons, you get a response like "1020812." 

 

My initial reaction was to just separate out responses with SUBSTRN and then combine responses of each into one variable - so if anyone had responded 1 for any of the separations, it would be combined into one variable. However, this approach neglected the fact that the first character in the string did not have a leading zero if it was a single digit response, while subsequent responses did have a leading zero if it was a single digit response. For example, in the above response - 1020812, it separates into 01, 02, 08, 12 because my code says to select the first digit of the string as a single variable and then to select by two's. Which is correct. However, if someone responded, 12, then it would incorrectly break into a response of 1 and 2 - which is incorrect. Here's my code if it helps: 

 

DATA example2;
SET example;
newQ1= SUBSTRN(Q1, 1,1);  
newQ1_1 = SUBSTRN(Q1,2,2);
newQ1_2 = SUBSTRN(Q1,4,2);
newQ1_3= SUBSTRN(Q1,6,2);
newQ1_4 = SUBSTRN (Q1,8,2);
newQ1_5 = SUBSTRN (Q1,10,2);
RUN;

 

and then for each of these new variables would do something like this to combine: 

DATA example3;
SET example2;
IF newQ1 =1 or newQ1_1= 01 or newQ1_2= 01 or newQ1_3 = 01 or newQ1_4 = 01 or newQ1_5 = 01 THEN Q1_new = 1; 
... and this this for each of the above 

 

Thus, I came up with the solution of adding a leading zero only to variables that had an odd number of digits, so that the separation between variables was equal. 

Occasional Contributor
Posts: 13

Re: Applying a format only to certain observations within a variable

Assuming it's possible to do what I am saying, the logic is that by adding a leading zero to observations with an odd # of characters, I could then just extract the first two digits of every observation and this issue would be resolved.

Respected Advisor
Posts: 4,969

Re: Applying a format only to certain observations within a variable

The first step would be to evaluate whether your data is correct as is, or whether you need to go back and re-create your data set.

 

If a person could select all 12 answers, you would end up with a 23-digit number.  That is more than SAS can accurately store.  The limit is 15 digits (16 on some operating systems, but in your survey 15 would handle all 8-answer responses).  Do you have any sense of whether anyone actually selected 9 or more items on a single question?

Occasional Contributor
Posts: 13

Re: Applying a format only to certain observations within a variable

Luckily, the most a person selected was six responses, and since the first response was a single digit, the longest response was only 11 characters long. 

Respected Advisor
Posts: 4,969

Re: Applying a format only to certain observations within a variable

[ Edited ]

OK, here's what I'm looking at as an approach then.  Assuming that Q1 contains up to 11 digits:

 

proc format;

value odd 0-99=[z2.]

100-9999=[z4.]

10000-999999=[z6.];

1000000-99999999=[z8.]

100000000-9999999999=[z10.]

10000000000-999999999999=[z12.]

run;

 

The way I'm interpreting the problem, Q1 indicates which parts should hold 1 instead of zero.  If that's correct, this program takes the existing Q1, and assigns 1 to some parts and 0 to the remaining parts:

 

data want;

set have;

length temp $ 12;

temp = put(Q1, odd.);

array q1_new {12} q1_new1 - q1_new12;

do i=1 to 12;

   q1_new{i} = 0;

end;

if q1 > 0 then do i=1 to length(temp) by 2;

   change_this_var = input(substr(temp, i), 2.);

   q1_new{change_this_var}=1;

end;

drop temp i change_this_var;

run;

 

It's untested but looks like it should work.

 

******************************************************

 

Edited to allow 12 "new" variables instead of 6 ... each with a value of 0 or 1.

 

Solution
‎07-12-2016 01:38 PM
SAS Super FREQ
Posts: 8,719

Re: Applying a format only to certain observations within a variable

Hi,

  And, as a comparison, here is how you would make new character variables in a program to make the new variables in a dataset.

cynthia

create_new_character_variables.png

 

Respected Advisor
Posts: 4,969

Re: Applying a format only to certain observations within a variable

It sounds like your variable is numeric, taking on integer values only.  From the example, it looks like you want left-hand justified strings printing, so here is how you would do that.  (If you want right-hand justified strings instead, only minor changes would be required but the code would be longer.)

 

proc format;

value odd 0-99=[z2.]

100-9999=[z4.]

10000-999999=[z6.];

run;

 

Then apply the ODD format to the variable in question.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 10 replies
  • 391 views
  • 0 likes
  • 3 in conversation