Hi friends,
I have a panel dataset which includes weekly data for several brands across four years (250 observations per brand). I appreciate if you help me with doing the following three things (I apologize in advance if I shouldn't ask multiple questions under one topic). Also sorry I cannot share a sample of my data due to NDA.
1. My dataset includes a column named "brand_name". I want to create a new column ("id") which uniquely identifies each brand. So it should take on the same value for all the 250 rows associated with each brand.
2. I need a new variable which assigns a number (from 1 to 250) to each observation for each brand. My dataset includes two columns "Year0" and "week". For each brand, my data starts from the 6th week of 2014 up until 52nd week of 2014. Then, from week 0 of 2015 to week 52 of 2015, and so on. In other words, when "Year0" is 2014, "week" starts from 6 all the way to 52. Then for the next row, in which "Year0" is 2015, "week" is 0 (so the variable week starts over with a change in Year0). The new variable should assign 1 whenever Year0 is 2014 and week is 6, and continue this until the last observation, for which it should be 250.
3. I have several variables which are in percentage (they have been calculated using avg() format=percent7.2 ). I need to convert these to numbers. so for example 91.70% should be converted to 0.917.
Thanks so much in advance for your help.
Assuming your data is sorted by brand_name year0 and week :
data want;
set have;
by brand_name;
if first.brand_name then do;
id + 1;
weekId = 0;
end;
weekId + 1;
format percentA percentB ....; /* Removes the format */
run;
Removing the format brings the data representation to its original (fractional) form.
@PGStats Thanks so much.
The code that you provided elegantly generates the brand_name ID and week ID. However, it does not convert the percentage values to fractional form. For example, I have a variable named "pct_svm_price". Base on your code, I ran the following. However, the format is still percentage. Am I missing something here?
data Final3;
set Final2;
format pct_svm_price; run;
I don't know if this is at all relevant, but as I said before, the percentage variables were originally generated using avg(). For example, pct_svm_price was calculated using the following code:
avg (case when svm_price= "yes" then 1 else 0 end) format=percent7.2 as pct_svm_price
Pay attention to details. You will find that @PGStats' code explicitly removes the PERCENT format, thus displaying the raw value (which is 0.917 for a value displayed as 91.7%).
@Kurt_Bremser I am so sorry, I read your message several times and went back to @PGStats 's code but couldn't figure out what is going on. BTW, I don't know if this helps or not but the variable pct_svm_price is character variable.
Come on, there's even a comment in there that says "Removes the format"!
@Kurt_Bremser I see that. That's why I did the following:
data Final3;
set Final2;
format pct_svm_price; run;
Am I missing something?
You want to remove the PERCENT format from the percent.... variables, so you have to name those in the FORMAT statement.
The variable is pct_svm_price, and I am naming it in the format statement (format pct_svm_price;).
Then please post the complete log of the step that creates this variable.
Use the </> button to post the log.
@Kurt_Bremser Below is the log
proc sql; create table Final as select avg (case when svm_price= "yes" then 1 else 0 end) format=percent7.2 as pct_svm_price from Pre-Final quit;
@Kurt_Bremser My bad! That was a typo. I corrected it.
Then the variable pct_sum_price cannot (I repeat: CANNOT) be of type character, and you simply need to omit the format= option in the SQL to get the raw value:
data pre_final;
input svm_price $;
datalines;
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
no
;
proc sql;
create table Final as
select avg (case when svm_price= "yes" then 1 else 0 end) as pct_svm_price
from pre_final;
quit;
Result in dataset final:
0.9473684211
To get from a character variable to a numeric variable, use the INPUT function with the PERCENT. informat:
data test;
pct_str = "91.7%";
pct_num = input(pct_str, percent.);
run;
proc print data=test; run;
Obs. pct_str pct_num 1 91.7% 0.917
Note that you cannot change the type of a variable, you have to create a new one.
If you simply want to change the string, convert to a number and back to a string, with a different format:
data test;
pct_str = "91.7%";
pct_str = left(put(input(pct_str, percent.), best.));
run;
proc print data=test; run;
Obs. pct_str 1 0.917
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.