BookmarkSubscribeRSS Feed
patrickchanou
Calcite | Level 5

Dear Members,

 

I have data set A and data set B located in sasuser library. I want to multiply column A of data set A to column B of data set B to create a new variable column C. The multiply must  take place when SS# is equal in both data sets. Data set A has more obs and columns than data set B. When I run the code below with an if condition, it does not work. I got 0 observation in new_data.

 

data new_data;

set sasuser.columnA sasuser.columnB;

If SS# = SS# then

column C = column A * column B;

run;

My log error is: Statement is not valid or s used out of proper order.

 

Not working so far. Would where a better option? Please help.

 

Thanks

 

10 REPLIES 10
PaigeMiller
Diamond | Level 26

If that's your actual code, then you can't use # in a variable name.

 

If that's not your actual code, then we need to see your actual code or we can't help you.

--
Paige Miller
patrickchanou
Calcite | Level 5

data new_data;

set sasuser.columnA sasuser.columnB;

If SS = SS then

column C = column A * column B;

run;

# is not the issue

Reeza
Super User

A SET statement will append the two data sets, but to do your calculation you need the data side by side, more commonly referred to as a MERGE.

 

There's an entire chapter in the SAS Docs about how to combine data;

http://documentation.sas.com/?docsetId=lrcon&docsetTarget=p15jvywi5avt3cn1bee8r6c33ux1.htm&docsetVer...

 

It's probably more complicated but should look something like this to start. However, you also need to be careful if there are duplicate entries for each SS and you do need to sort the data first.

 

 

data new_data;

merge data1 data2;

by SS;

columnC = ColumnA * ColumnB;

run;

Another link that may be helpful is a YouTube tutorial on merges:

 

http://video.sas.com/detail/video/4572997800001/merging-sas-tables-in-a-data-step?autoStart=true&q=m...

 

If you're using SAS Studio or EG, consider using the GUI tools/Tasks or Snippets and then looking at the code generated. 

 

 

patrickchanou
Calcite | Level 5

Yes there are duplicates. data set A has 60000 observations but data set B has 6000.

Reeza
Super User

@patrickchanou wrote:

Yes there are duplicates. data set A has 60000 observations but data set B has 6000.


I think my answer above should get you started. If you need further help please post some sample data - obviously not your real data, but similar enough to explain the issue. It should include both what you have and what you want. Include your current code and the log that shows whatever errors  you're encountering.

 

 

Here's an example of how that looks:

https://communities.sas.com/t5/Base-SAS-Programming/Fast-way-to-remove-overlapping-date-spans/m-p/41...

 

If you ask a generic question, you get generic responses. If you need further help, you need to be more detailed. 

 

Good Luck. 

patrickchanou
Calcite | Level 5

data new_data;

set sasuser.dataA sasuser.dataB;

If SS = SS then

column C = column A (from dataA) * column B (from dataB);

run;

 

 

dataA has 2 numeric variable columns grade and SS with 60000 obs

dataB has 2 numeric variable columns weight with SS with 6000 obs

SS is the column we want to use to sort and do the multiplication by. There are lots of duplicates in the column SS

 

Let me know if that explains better.

 

ballardw
Super User

This is an example of taking all the records from column A data set and if there is a matching SSN in the column B dataset add the column b value to the data and multiply as described.

 

proc sql;
   create table want as
   select a.ssn, a.columna, b.columnb, a.columna*b.columnb as columnc
   from sasuser.columna as a
        left join
        sasuser.columnb as b
        on a.ssn=b.ssn
   ;
quit;

The a. and b. are ways that Proc SQL uses as shorthand for the data sets the assignment of the short hand is in the FROM clause with optional as .

 

 

You did not indicate if any of the SSN values might be duplicated in either set or what to do with the duplicates. This multiplies all of the matches if there are duplicates. If the SSN in the ColumnA data set only appears once but is in ColumnB multiple times there will be one record for each match in ColumnB.

patrickchanou
Calcite | Level 5

Million thanks ballardw. They are many duplicates. I want them multiplied also. I will run the code and advise further based on the results.

patrickchanou
Calcite | Level 5

I did run the code. It generated 1 error. It is not finding Sampling_weight or columnb but Sampling_weight is located in data set b in sasuser library. See below.

 

proc sql;
create table want as
select a.CHIC, a.net_payment_error, b.Sampling_weight, a.net_payment_error * b.Sampling_weight as extra_payment_error
 
from sasuser.net_payment_error as a
         left join
        sasuser.Sampling_weight as b
on a.CHIC = b.CHIC;
quit;
ERROR: File SASUSER.SAMPLING_WEIGHT.DATA does not exist.
Reeza
Super User

It's saying you don't have a dataset named sample_weight, check for a typo in the name. 

If you're certain you have it, please display a proc contents.

 

proc contents data=sasuser.sampling_weights;run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 6836 views
  • 0 likes
  • 4 in conversation