BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.

Hi everyone,

My data is in this format:

category_1     category_2     result

1     a     10

2     a     20

1     b     30

2     b     40

I would like to create a variable that would give me a/b for each category_1. For instance if this new variable was called new_variable, the dataset would now look like this:

category_1     category_2     result     new_variable

1     a     10     0.333333

2     a     20     0.5

1     b     30     .

2     b     40     .ariable

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Sounds like a simple merge will do what you want.  Just take the observation for the control group and merge it onto the rest of the observations. 

category_1     category_2     result

1     a     10

2     a     20

1     b     30

2     b     40

CATEGORY_1 is your BY group for the merge. CATEGORY_2 is  your treatment group and will be used to subset the observations .

data want ;

   merge have (where=(category_2 ne 'b') )

             have (rename=(category_2=trt result=control) where=(trt='b') )

  ;

  by category_1 ;

  new_variable = result / control ;

run;

View solution in original post

11 REPLIES 11
AncaTilea
Pyrite | Level 9

Hi.

this code should get you somehow started.

It is a bit confusing to me how you want the new_variable to be displayed at the end, how did you chose it to show on the first two rows?

Anyway, take a look.

data in;

input category_1     category_2 $    result;

datalines;

1     a     10

2     a     20

1     b     30

2     b     40

;

run;

proc sort data = in;by category_1 category_2;

proc transpose data = in out = want(drop = _NAME_);

    by category_1 ;id category_2;

    var result;

run;

data really_want;    

     set want;

     new_variable = a/b;

run;

data final;

    merge in really_want(rename = (a = result) drop = b);

    by category_1 result;

run;

Good luck!

Anca.

Reeza
Super User

Given what you've explained a proc sql step will work, but order of the data matters and the a/b are hardcoded so Anca solution may be preferable, or you need to provide additional details.

data have;

input category_1    category_2 $    result;

cards;

1     a     10

2     a     20

1     b     30

2     b     40

;

run;

proc sql;

    create table want as

    select a.*, a.result/b.result as new_variable

    from have a

    left join have b

    on a.category_1=b.category_1

    and a.category_2='a'

    and b.category_2='b'

    order by category_2, category_1;

quit;

charles_pignon1
Calcite | Level 5

Hi,

To answer your question Anca, I would want new_variable to be a percentage of a single variable type (basically the control). I guess here's a simpler way to present it:

category_1     treatment     result

1     a     10

2     a     20

1     control     30

2     control     40

1     c     50

2     c     60

The resulting dataset I would like to have would have percentages of the control for each category_1, so a/control and c/control for both 1 and 2:

category_1     treatment     result     new_variable

1     a     10     .333333

2     a     20     .5

1     control     30     1

2     control     40     1

1     c     50     1.666667

2     c     60     1.5

I tried your code and was able to rearrange the data, but not obtain the new variable (% of control)

charles_pignon1
Calcite | Level 5

This code worked, but I ran into problems when using it on my real dataset, in which both category_1 and category_2 are numerical variables (instead of category_2 having letters like a and b). Replacing a and b by the actual 1 and 2 seems to cause a bunch of errors. Is there an option that will read numbers as well as letters?

Here is what the code now looks like, where a is replaced by 3 and b is replaced by 4

data have;

input category_1    category_2 $    result;

cards;

1     3     10

2     3     20

1     4     30

2     4     40

;

run;

proc sql;

    create table want as

    select 3.*, 3.result/4.result as new_variable

    from have 3

    left join have 4

    on 3.hour=4.hour

    and 3.category_2='3'

    and 4.category_2='4'

    order by category_2, category_1;

quit;

Reeza
Super User

Remove the quotation marks from the numbers.

charles_pignon1
Calcite | Level 5

I tried removing the quotation marks, but the problem seems to appear earlier, at the select 3.* phase  (line 3 of proc sql). Here a.* is processed just fine but 3.* is not.

Reeza
Super User

those a and b don't relate to the data, they're table alias, a shortcut way to reference tables. A/B are just convenient, though in this case confusing Smiley Happy

proc sql;

    create table want as

    select a.*, a.result/b.result as new_variable

    from have a

    left join have b

    on a.hour=b.hour

    and a.category_2=3

    and b.category_2=4

    order by category_2, category_1;

quit;

charles_pignon1
Calcite | Level 5

Great! I just have one last question: how can I modify this code to work with more variables in category_2: for instance instead of 3 and 4 if I head 3 4 5 and 6 but still wanted the result of all of them divided by the result for variable 4? I imagine I would need to add:

and c.category_2=5

and d.category_2=6

but I can't get the syntax before that to work...

PGStats
Opal | Level 21

If you want to include the values where category_2 = "b" in your output with missing values for new_variable, the SQL solution must resort to the rarely used full join :

data have;
input category_1    category_2 $    result;
cards;
1     a     10
2     a     20
1     b     30
2     b     40
;

proc sql;
create table want as
select
     a.*,
     a.result/(b.result*(a.category_2 ne "b")) as new_variable
from
     have as a full join
     have as b on a.category_1=b.category_1
where b.category_2 = "b"
order by a.category_2, a.category_1;

select * from want;
quit;

PG

Message was edited by: PG Changed  = "a"  to ne "b"

PG
Tom
Super User Tom
Super User

Sounds like a simple merge will do what you want.  Just take the observation for the control group and merge it onto the rest of the observations. 

category_1     category_2     result

1     a     10

2     a     20

1     b     30

2     b     40

CATEGORY_1 is your BY group for the merge. CATEGORY_2 is  your treatment group and will be used to subset the observations .

data want ;

   merge have (where=(category_2 ne 'b') )

             have (rename=(category_2=trt result=control) where=(trt='b') )

  ;

  by category_1 ;

  new_variable = result / control ;

run;

charles_pignon1
Calcite | Level 5

This works perfectly! Thanks!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 1293 views
  • 7 likes
  • 5 in conversation