DATA Step, Macro, Functions and more

Multiply variables across data sets using a condition

Reply
Occasional Contributor
Posts: 6

Multiply variables across data sets using a condition

Dear Members,

 

I have data set A and data set B located in sasuser library. I want to multiply column A of data set A to column B of data set B to create a new variable column C. The multiply must  take place when SS# is equal in both data sets. Data set A has more obs and columns than data set B. When I run the code below with an if condition, it does not work. I got 0 observation in new_data.

 

data new_data;

set sasuser.columnA sasuser.columnB;

If SS# = SS# then

column C = column A * column B;

run;

My log error is: Statement is not valid or s used out of proper order.

 

Not working so far. Would where a better option? Please help.

 

Thanks

 

Respected Advisor
Posts: 3,247

Re: Multiply variables across data sets using a condition

Posted in reply to patrickchanou

If that's your actual code, then you can't use # in a variable name.

 

If that's not your actual code, then we need to see your actual code or we can't help you.

--
Paige Miller
Occasional Contributor
Posts: 6

Re: Multiply variables across data sets using a condition

Posted in reply to PaigeMiller

data new_data;

set sasuser.columnA sasuser.columnB;

If SS = SS then

column C = column A * column B;

run;

# is not the issue

Super User
Posts: 23,928

Re: Multiply variables across data sets using a condition

[ Edited ]
Posted in reply to patrickchanou

A SET statement will append the two data sets, but to do your calculation you need the data side by side, more commonly referred to as a MERGE.

 

There's an entire chapter in the SAS Docs about how to combine data;

http://documentation.sas.com/?docsetId=lrcon&docsetTarget=p15jvywi5avt3cn1bee8r6c33ux1.htm&docsetVer...

 

It's probably more complicated but should look something like this to start. However, you also need to be careful if there are duplicate entries for each SS and you do need to sort the data first.

 

 

data new_data;

merge data1 data2;

by SS;

columnC = ColumnA * ColumnB;

run;

Another link that may be helpful is a YouTube tutorial on merges:

 

http://video.sas.com/detail/video/4572997800001/merging-sas-tables-in-a-data-step?autoStart=true&q=m...

 

If you're using SAS Studio or EG, consider using the GUI tools/Tasks or Snippets and then looking at the code generated. 

 

 

Occasional Contributor
Posts: 6

Re: Multiply variables across data sets using a condition

Yes there are duplicates. data set A has 60000 observations but data set B has 6000.

Super User
Posts: 23,928

Re: Multiply variables across data sets using a condition

Posted in reply to patrickchanou

patrickchanou wrote:

Yes there are duplicates. data set A has 60000 observations but data set B has 6000.


I think my answer above should get you started. If you need further help please post some sample data - obviously not your real data, but similar enough to explain the issue. It should include both what you have and what you want. Include your current code and the log that shows whatever errors  you're encountering.

 

 

Here's an example of how that looks:

https://communities.sas.com/t5/Base-SAS-Programming/Fast-way-to-remove-overlapping-date-spans/m-p/41...

 

If you ask a generic question, you get generic responses. If you need further help, you need to be more detailed. 

 

Good Luck. 

Occasional Contributor
Posts: 6

Re: Multiply variables across data sets using a condition

data new_data;

set sasuser.dataA sasuser.dataB;

If SS = SS then

column C = column A (from dataA) * column B (from dataB);

run;

 

 

dataA has 2 numeric variable columns grade and SS with 60000 obs

dataB has 2 numeric variable columns weight with SS with 6000 obs

SS is the column we want to use to sort and do the multiplication by. There are lots of duplicates in the column SS

 

Let me know if that explains better.

 

Super User
Posts: 13,868

Re: Multiply variables across data sets using a condition

Posted in reply to patrickchanou

This is an example of taking all the records from column A data set and if there is a matching SSN in the column B dataset add the column b value to the data and multiply as described.

 

proc sql;
   create table want as
   select a.ssn, a.columna, b.columnb, a.columna*b.columnb as columnc
   from sasuser.columna as a
        left join
        sasuser.columnb as b
        on a.ssn=b.ssn
   ;
quit;

The a. and b. are ways that Proc SQL uses as shorthand for the data sets the assignment of the short hand is in the FROM clause with optional as .

 

 

You did not indicate if any of the SSN values might be duplicated in either set or what to do with the duplicates. This multiplies all of the matches if there are duplicates. If the SSN in the ColumnA data set only appears once but is in ColumnB multiple times there will be one record for each match in ColumnB.

Occasional Contributor
Posts: 6

Re: Multiply variables across data sets using a condition

Million thanks ballardw. They are many duplicates. I want them multiplied also. I will run the code and advise further based on the results.

Occasional Contributor
Posts: 6

Re: Multiply variables across data sets using a condition

I did run the code. It generated 1 error. It is not finding Sampling_weight or columnb but Sampling_weight is located in data set b in sasuser library. See below.

 

proc sql;
create table want as
select a.CHIC, a.net_payment_error, b.Sampling_weight, a.net_payment_error * b.Sampling_weight as extra_payment_error
 
from sasuser.net_payment_error as a
         left join
        sasuser.Sampling_weight as b
on a.CHIC = b.CHIC;
quit;
ERROR: File SASUSER.SAMPLING_WEIGHT.DATA does not exist.
Super User
Posts: 23,928

Re: Multiply variables across data sets using a condition

Posted in reply to patrickchanou

It's saying you don't have a dataset named sample_weight, check for a typo in the name. 

If you're certain you have it, please display a proc contents.

 

proc contents data=sasuser.sampling_weights;run;
Ask a Question
Discussion stats
  • 10 replies
  • 407 views
  • 0 likes
  • 4 in conversation