BookmarkSubscribeRSS Feed
AlG
Quartz | Level 8 AlG
Quartz | Level 8

Hi friends,

I have a panel dataset which includes weekly data for several brands across four years (250 observations per brand). I appreciate if you help me with doing the following three things (I apologize in advance if I shouldn't ask multiple questions under one topic). Also sorry I cannot share a sample of my data due to NDA.

 

1. My dataset includes a column named "brand_name". I want to create a new column ("id") which uniquely identifies each brand. So it should take on the same value for all the 250 rows associated with each brand.

 

2. I need a new variable which assigns a number (from 1 to 250) to each observation for each brand. My dataset includes two columns "Year0" and "week". For each brand, my data starts from the 6th week of 2014 up until 52nd week of 2014. Then, from week 0 of 2015 to week 52 of 2015, and so on. In other words, when "Year0" is 2014, "week" starts from 6 all the way to 52. Then for the next row, in which "Year0" is 2015, "week" is 0 (so the variable week starts over with a change in Year0). The new variable should assign 1 whenever Year0 is 2014 and week is 6, and continue this until the last observation, for which it should be 250.

 

3. I have several variables which are in percentage (they have been calculated using avg() format=percent7.2 ). I need to convert these to numbers. so for example 91.70% should be converted to 0.917.

 

Thanks so much in advance for your help.

14 REPLIES 14
PGStats
Opal | Level 21

Assuming your data is sorted by brand_name year0 and week :

 

data want;
set have;
by brand_name;
if first.brand_name then do;
   id + 1;
   weekId = 0;
   end;
weekId + 1;
format percentA percentB ....; /* Removes the format */
run;

Removing the format brings the data representation to its original (fractional) form.

PG
AlG
Quartz | Level 8 AlG
Quartz | Level 8

@PGStats Thanks so much.

The code that you provided elegantly generates the brand_name ID and week ID. However, it does not convert the percentage values to fractional form. For example, I have a variable named "pct_svm_price". Base on your code, I ran the following. However, the format is still percentage. Am I missing something here?

 

data Final3;
set Final2;
format pct_svm_price; run;

I don't know if this is at all relevant, but as I said before, the percentage variables were originally generated using avg(). For example, pct_svm_price was calculated using the following code:

 

avg (case when svm_price= "yes" then 1 else 0 end) format=percent7.2 as pct_svm_price

 

 

 

 

Kurt_Bremser
Super User

Pay attention to details. You will find that @PGStats' code explicitly removes the PERCENT format, thus displaying the raw value (which is 0.917 for a value displayed as 91.7%).

AlG
Quartz | Level 8 AlG
Quartz | Level 8

@Kurt_Bremser I am so sorry, I read your message several times and went back to @PGStats 's code but couldn't figure out what is going on. BTW, I don't know if this helps or not but the variable pct_svm_price is character variable.

AlG
Quartz | Level 8 AlG
Quartz | Level 8

@Kurt_Bremser I see that. That's why I did the following:

 

data Final3;
set Final2;
format pct_svm_price; run;

Am I missing something?

AlG
Quartz | Level 8 AlG
Quartz | Level 8

@Kurt_Bremser 

The variable is pct_svm_price, and I am naming it in the format statement (format pct_svm_price;).

AlG
Quartz | Level 8 AlG
Quartz | Level 8

@Kurt_Bremser Below is the log

 

proc sql;
    create table Final as
    select avg (case when svm_price= "yes" then 1 else 0 end) format=percent7.2 as pct_svm_price
from Pre-Final
quit;
AlG
Quartz | Level 8 AlG
Quartz | Level 8

@Kurt_Bremser My bad! That was a typo. I corrected it.

Kurt_Bremser
Super User

Then the variable pct_sum_price cannot (I repeat: CANNOT) be of type character, and you simply need to omit the format= option in the SQL to get the raw value:

data pre_final;
input svm_price $;
datalines;
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
no
;

proc sql;
create table Final as
  select avg (case when svm_price= "yes" then 1 else 0 end) as pct_svm_price
  from pre_final;
quit;

Result in dataset final:

0.9473684211
PGStats
Opal | Level 21

To get from a character variable to a numeric variable, use the INPUT function with the PERCENT. informat:

 

 

data test;
pct_str = "91.7%";
pct_num = input(pct_str, percent.);
run;

proc print data=test; run;
Obs. 	pct_str 	pct_num
1 	91.7% 	0.917

Note that you cannot change the type of a variable, you have to create a new one.

 

 

If you simply want to change the string, convert to a number and back to a string, with a different format:

 

data test;
pct_str = "91.7%";
pct_str = left(put(input(pct_str, percent.), best.));
run;

proc print data=test; run;
Obs. 	pct_str
1 	0.917
PG

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 14 replies
  • 1545 views
  • 1 like
  • 3 in conversation