Datasets with 5k records. i have to find all workers in the 3rd quartile of the amount they earned And keep only lastname,total earning and their ID.
This is the raw data of both dataset
Earning data set, there are spendingid, customerid, totalprice
Earning_id Worker_id totalearning
1212112 12311111 1900
989898 9908009 2980
3i31030 3376247 20000
233233 3332323 6000
Worker dataset, there are customerid and firstname
Worker_id lastname
12311111 Ellen
9908009 JOhn
3376247 Jay
My code:
data work.merged;
set assign.earnings assign.workers
keep worker_id totalearning lastname;
run;
proc print data=work.merged;
run;
Q) So basically i need to merge the two dataset first because the lastname does not contain in the earnings dataset and then showing the Q3 of those customer earnings.
Result:
worker_id totalearnings lastname
121112 1900
989898 2980
I don;t see any part in your code where you merge the data, the first datastep "sets" the two dataset one under the other - thats not merging. Try:
proc sql; create table MERGED as select A.*, B.LASTNAME from EARNING A left join WORKER B on A.WORKER_ID=B.WORKER_ID; quit;
This will add lastname onto your data, you could of course do the same in datastep with the merge rather than set statement.
data work.merged;
set assign.earnings assign.workers
keep worker_id totalearning lastname;
run;
// this is not merged? cause i setting both datasets together as one?
How do i show the data from Q3 onwards, i have 5k records so probably 4k records onwards, i would like to display. Thanks
"// this is not merged? cause i setting both datasets together as one?"
No, its not merged it is appeneded.
Set - this is the the term used in SAS to mean to append one dataset underneath the previous dataset. So workers dataset starts appearing after the end of earnings dataset.
Merge - this is the term used in SAS to mean the combination of matches on the given by variables. So records from workers get merged to records of earnings based on the by groups given.
"How do i show the data from Q3 onwards, i have 5k records so probably 4k records onwards, i would like to display. Thanks"
I do not know what you mean by this. There is no mention in any of your provided test examples that show a quarter. You have simply provided one dataset with employees and earnings and another with last names, which merge together based on worker_id. I seen nothing about quarter, or in fact any date or information to that effect. Please clarify your question, providing exact test data, in the form of datasteps, and what you expect to see out.
thanks for explaining the differences. So, after proc sql i did a proc rank which creates another variable to indicate the quartile range for each worker. for example, 1,2,3. My question is to display worker that falls under value 3 and keep only the 3 columns i mentioned above. Any advise on continuing the code?
Well, from your text there:
data want;
set data_from_rank (where=(rank_value=3) keep=worker_id totalearnings lastname);
run;
That assumes of course that you have merged on lastname to your data at some point previously. But I still think your problem is the set part. What it will do is put the data from table 1 and then the data from table 2, so last name will always be blank for the data your interested in:
Earning_id Worker_id totalearning Lastname
1212112 12311111 1900
989898 9908009 2980
3i31030 3376247 20000
233233 3332323 6000
/* start of second dataset */
12311111 Ellen
9908009 JOhn
3376247 Jay
"Merge" implies that records are combined such as matching an customer address from one dataset to an amount billed from another on an account number, you have appended data sets. There is a MERGE statement one would expect in a datastep if merging is going on, not set.
Your data description is incomplete and may require multiple steps. First is the data that you have one record for the period of interest with the total or is it multiple records per worker? If multiple records then you need to combine them to get the total per worker.
Or does you data need to have earning information matched to the worker identifcation first?
It helps to provide some examples of what your data actually looks like. Best would be to use the instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... to create some datasteps of your data (or a subset) or dummy data that looks like your data but does not contain any sensitive information. You only need to include the variables needed.
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.