DATA Step, Macro, Functions and more

Create a new variable by calculating values from a single column?

Accepted Solution Solved
Reply
Contributor
Posts: 21
Accepted Solution

Create a new variable by calculating values from a single column?

Hi everyone,

My data is in this format:

category_1     category_2     result

1     a     10

2     a     20

1     b     30

2     b     40

I would like to create a variable that would give me a/b for each category_1. For instance if this new variable was called new_variable, the dataset would now look like this:

category_1     category_2     result     new_variable

1     a     10     0.333333

2     a     20     0.5

1     b     30     .

2     b     40     .ariable


Accepted Solutions
Solution
‎06-25-2013 08:17 PM
Super User
Super User
Posts: 7,050

Re: Create a new variable by calculating values from a single column?

Posted in reply to charles_pignon1

Sounds like a simple merge will do what you want.  Just take the observation for the control group and merge it onto the rest of the observations. 

category_1     category_2     result

1     a     10

2     a     20

1     b     30

2     b     40

CATEGORY_1 is your BY group for the merge. CATEGORY_2 is  your treatment group and will be used to subset the observations .

data want ;

   merge have (where=(category_2 ne 'b') )

             have (rename=(category_2=trt result=control) where=(trt='b') )

  ;

  by category_1 ;

  new_variable = result / control ;

run;

View solution in original post


All Replies
Super Contributor
Posts: 543

Re: Create a new variable by calculating values from a single column?

Posted in reply to charles_pignon1

Hi.

this code should get you somehow started.

It is a bit confusing to me how you want the new_variable to be displayed at the end, how did you chose it to show on the first two rows?

Anyway, take a look.

data in;

input category_1     category_2 $    result;

datalines;

1     a     10

2     a     20

1     b     30

2     b     40

;

run;

proc sort data = in;by category_1 category_2;

proc transpose data = in out = want(drop = _NAME_);

    by category_1 ;id category_2;

    var result;

run;

data really_want;    

     set want;

     new_variable = a/b;

run;

data final;

    merge in really_want(rename = (a = result) drop = b);

    by category_1 result;

run;

Good luck!

Anca.

Super User
Posts: 19,805

Re: Create a new variable by calculating values from a single column?

Posted in reply to charles_pignon1

Given what you've explained a proc sql step will work, but order of the data matters and the a/b are hardcoded so Anca solution may be preferable, or you need to provide additional details.

data have;

input category_1    category_2 $    result;

cards;

1     a     10

2     a     20

1     b     30

2     b     40

;

run;

proc sql;

    create table want as

    select a.*, a.result/b.result as new_variable

    from have a

    left join have b

    on a.category_1=b.category_1

    and a.category_2='a'

    and b.category_2='b'

    order by category_2, category_1;

quit;

Contributor
Posts: 21

Re: Create a new variable by calculating values from a single column?

Hi,

To answer your question Anca, I would want new_variable to be a percentage of a single variable type (basically the control). I guess here's a simpler way to present it:

category_1     treatment     result

1     a     10

2     a     20

1     control     30

2     control     40

1     c     50

2     c     60

The resulting dataset I would like to have would have percentages of the control for each category_1, so a/control and c/control for both 1 and 2:

category_1     treatment     result     new_variable

1     a     10     .333333

2     a     20     .5

1     control     30     1

2     control     40     1

1     c     50     1.666667

2     c     60     1.5

I tried your code and was able to rearrange the data, but not obtain the new variable (% of control)

Contributor
Posts: 21

Re: Create a new variable by calculating values from a single column?

This code worked, but I ran into problems when using it on my real dataset, in which both category_1 and category_2 are numerical variables (instead of category_2 having letters like a and b). Replacing a and b by the actual 1 and 2 seems to cause a bunch of errors. Is there an option that will read numbers as well as letters?

Here is what the code now looks like, where a is replaced by 3 and b is replaced by 4

data have;

input category_1    category_2 $    result;

cards;

1     3     10

2     3     20

1     4     30

2     4     40

;

run;

proc sql;

    create table want as

    select 3.*, 3.result/4.result as new_variable

    from have 3

    left join have 4

    on 3.hour=4.hour

    and 3.category_2='3'

    and 4.category_2='4'

    order by category_2, category_1;

quit;

Super User
Posts: 19,805

Re: Create a new variable by calculating values from a single column?

Posted in reply to charles_pignon1

Remove the quotation marks from the numbers.

Contributor
Posts: 21

Re: Create a new variable by calculating values from a single column?

I tried removing the quotation marks, but the problem seems to appear earlier, at the select 3.* phase  (line 3 of proc sql). Here a.* is processed just fine but 3.* is not.

Super User
Posts: 19,805

Re: Create a new variable by calculating values from a single column?

Posted in reply to charles_pignon1

those a and b don't relate to the data, they're table alias, a shortcut way to reference tables. A/B are just convenient, though in this case confusing Smiley Happy

proc sql;

    create table want as

    select a.*, a.result/b.result as new_variable

    from have a

    left join have b

    on a.hour=b.hour

    and a.category_2=3

    and b.category_2=4

    order by category_2, category_1;

quit;

Contributor
Posts: 21

Re: Create a new variable by calculating values from a single column?

Great! I just have one last question: how can I modify this code to work with more variables in category_2: for instance instead of 3 and 4 if I head 3 4 5 and 6 but still wanted the result of all of them divided by the result for variable 4? I imagine I would need to add:

and c.category_2=5

and d.category_2=6

but I can't get the syntax before that to work...

Respected Advisor
Posts: 4,923

Re: Create a new variable by calculating values from a single column?

Posted in reply to charles_pignon1

If you want to include the values where category_2 = "b" in your output with missing values for new_variable, the SQL solution must resort to the rarely used full join :

data have;
input category_1    category_2 $    result;
cards;
1     a     10
2     a     20
1     b     30
2     b     40
;

proc sql;
create table want as
select
     a.*,
     a.result/(b.result*(a.category_2 ne "b")) as new_variable
from
     have as a full join
     have as b on a.category_1=b.category_1
where b.category_2 = "b"
order by a.category_2, a.category_1;

select * from want;
quit;

PG

Message was edited by: PG Changed  = "a"  to ne "b"

PG
Solution
‎06-25-2013 08:17 PM
Super User
Super User
Posts: 7,050

Re: Create a new variable by calculating values from a single column?

Posted in reply to charles_pignon1

Sounds like a simple merge will do what you want.  Just take the observation for the control group and merge it onto the rest of the observations. 

category_1     category_2     result

1     a     10

2     a     20

1     b     30

2     b     40

CATEGORY_1 is your BY group for the merge. CATEGORY_2 is  your treatment group and will be used to subset the observations .

data want ;

   merge have (where=(category_2 ne 'b') )

             have (rename=(category_2=trt result=control) where=(trt='b') )

  ;

  by category_1 ;

  new_variable = result / control ;

run;

Contributor
Posts: 21

Re: Create a new variable by calculating values from a single column?

This works perfectly! Thanks!

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 11 replies
  • 328 views
  • 7 likes
  • 5 in conversation