11-29-2017 02:13 PM
I have data set A and data set B located in sasuser library. I want to multiply column A of data set A to column B of data set B to create a new variable column C. The multiply must take place when SS# is equal in both data sets. Data set A has more obs and columns than data set B. When I run the code below with an if condition, it does not work. I got 0 observation in new_data.
set sasuser.columnA sasuser.columnB;
If SS# = SS# then
column C = column A * column B;
My log error is: Statement is not valid or s used out of proper order.
Not working so far. Would where a better option? Please help.
11-29-2017 02:17 PM
If that's your actual code, then you can't use # in a variable name.
If that's not your actual code, then we need to see your actual code or we can't help you.
11-29-2017 02:21 PM - edited 11-29-2017 02:26 PM
A SET statement will append the two data sets, but to do your calculation you need the data side by side, more commonly referred to as a MERGE.
There's an entire chapter in the SAS Docs about how to combine data;
It's probably more complicated but should look something like this to start. However, you also need to be careful if there are duplicate entries for each SS and you do need to sort the data first.
data new_data; merge data1 data2; by SS; columnC = ColumnA * ColumnB; run;
Another link that may be helpful is a YouTube tutorial on merges:
If you're using SAS Studio or EG, consider using the GUI tools/Tasks or Snippets and then looking at the code generated.
11-29-2017 02:38 PM
Yes there are duplicates. data set A has 60000 observations but data set B has 6000.
I think my answer above should get you started. If you need further help please post some sample data - obviously not your real data, but similar enough to explain the issue. It should include both what you have and what you want. Include your current code and the log that shows whatever errors you're encountering.
Here's an example of how that looks:
If you ask a generic question, you get generic responses. If you need further help, you need to be more detailed.
11-29-2017 04:29 PM
set sasuser.dataA sasuser.dataB;
If SS = SS then
column C = column A (from dataA) * column B (from dataB);
dataA has 2 numeric variable columns grade and SS with 60000 obs
dataB has 2 numeric variable columns weight with SS with 6000 obs
SS is the column we want to use to sort and do the multiplication by. There are lots of duplicates in the column SS
Let me know if that explains better.
11-29-2017 04:08 PM
This is an example of taking all the records from column A data set and if there is a matching SSN in the column B dataset add the column b value to the data and multiply as described.
proc sql; create table want as select a.ssn, a.columna, b.columnb, a.columna*b.columnb as columnc from sasuser.columna as a left join sasuser.columnb as b on a.ssn=b.ssn ; quit;
The a. and b. are ways that Proc SQL uses as shorthand for the data sets the assignment of the short hand is in the FROM clause with optional as .
You did not indicate if any of the SSN values might be duplicated in either set or what to do with the duplicates. This multiplies all of the matches if there are duplicates. If the SSN in the ColumnA data set only appears once but is in ColumnB multiple times there will be one record for each match in ColumnB.
11-29-2017 05:34 PM
Million thanks ballardw. They are many duplicates. I want them multiplied also. I will run the code and advise further based on the results.
11-30-2017 08:47 AM
I did run the code. It generated 1 error. It is not finding Sampling_weight or columnb but Sampling_weight is located in data set b in sasuser library. See below.
11-30-2017 10:43 AM
It's saying you don't have a dataset named sample_weight, check for a typo in the name.
If you're certain you have it, please display a proc contents.
proc contents data=sasuser.sampling_weights;run;