DATA Step, Macro, Functions and more

SQL Q3

Reply
Contributor Boa
Contributor
Posts: 21

SQL Q3

[ Edited ]
 

Datasets with 5k records. i have to find all workers in the 3rd quartile of the amount they earned  And keep only lastname,total earning and their ID.

 

This is the raw data of both dataset

Earning data set, there are spendingid, customerid, totalprice 

Earning_id                  Worker_id                 totalearning            

  1212112                         12311111                        1900                      

  989898                           9908009                        2980                   

  3i31030                           3376247                          20000                    

  233233                            3332323                          6000

Worker dataset, there are customerid and firstname

  Worker_id                 lastname

  12311111                        Ellen

  9908009                         JOhn

  3376247                         Jay

 

 My code:

 

 data work.merged;

 set assign.earnings assign.workers 

 keep worker_id totalearning lastname;

 run;

  

proc print data=work.merged;

run;

  

Q) So basically i need to merge the two dataset first because the lastname does not contain in the earnings dataset and then showing the Q3 of those customer earnings.

Result:

worker_id  totalearnings lastname

121112       1900      

989898      2980

Super User
Super User
Posts: 7,988

Re: Merge help with displaying Q3 onwards

I don;t see any part in your code where you merge the data, the first datastep "sets" the two dataset one under the other - thats not merging.  Try:

proc sql;
  create table MERGED as
  select A.*,
         B.LASTNAME
  from   EARNING A
  left join WORKER B
  on     A.WORKER_ID=B.WORKER_ID;
quit;

This will add lastname onto your data, you could of course do the same in datastep with the merge rather than set statement.

 

 

Contributor Boa
Contributor
Posts: 21

Re: Merge help with displaying Q3 onwards

data work.merged;

 set assign.earnings assign.workers 

 keep worker_id totalearning lastname;

 run;

 

// this is not merged? cause i setting both datasets together as one?

How do i show the data from Q3 onwards, i have 5k records so probably 4k records onwards, i would like to display. Thanks

Super User
Super User
Posts: 7,988

Re: Merge help with displaying Q3 onwards

"// this is not merged? cause i setting both datasets together as one?"

No, its not merged it is appeneded.

Set - this is the the term used in SAS to mean to append one dataset underneath the previous dataset.  So workers dataset starts appearing after the end of earnings dataset.

 

Merge - this is the term used in SAS to mean the combination of matches on the given by variables.  So records from workers get merged to records of earnings based on the by groups given.

 

"How do i show the data from Q3 onwards, i have 5k records so probably 4k records onwards, i would like to display. Thanks"

I do not know what you mean by this.  There is no mention in any of your provided test examples that show a quarter.  You have simply provided one dataset with employees and earnings and another with last names, which merge together based on worker_id.  I seen nothing about quarter, or in fact any date or information to that effect.  Please clarify your question, providing exact test data, in the form of datasteps, and what you expect to see out.

Contributor Boa
Contributor
Posts: 21

Re: Merge help with displaying Q3 onwards

thanks for explaining the differences. So, after proc sql i did a proc rank which creates another variable to indicate the quartile range for each worker. for example, 1,2,3. My question is to display worker that falls under value 3 and keep only the 3 columns i mentioned above. Any advise on continuing the code?

Super User
Super User
Posts: 7,988

Re: Merge help with displaying Q3 onwards

Well, from your text there:

data want;
  set data_from_rank (where=(rank_value=3) keep=worker_id totalearnings lastname);
run;

That assumes of course that you have merged on lastname to your data at some point previously.  But I still think your problem is the set part.  What it will do is put the data from table 1 and then the data from table 2, so last name will always be blank for the data your interested in:

Earning_id                  Worker_id                 totalearning               Lastname            

  1212112                         12311111                        1900                      

  989898                           9908009                        2980                   

  3i31030                           3376247                          20000                    

  233233                            3332323                          6000

/* start of second dataset */

                                           12311111                                                 Ellen

                                            9908009                                                 JOhn

                                            3376247                                                 Jay

 

Super User
Posts: 11,343

Re: Merge help with displaying Q3 onwards

"Merge" implies that records are combined such as matching an customer address from one dataset to an amount billed from another on an account number, you have appended data sets. There is a MERGE statement one would expect in a datastep if merging is going on, not set.

 

Your data description is incomplete and may require multiple steps. First is the data that you have one record for the period of interest with the total or is it multiple records per worker? If multiple records then you need to combine them to get the total per worker.

Or does you data need to have earning information matched to the worker identifcation first?

 

It helps to provide some examples of what your data actually looks like. Best would be to use the instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... to create some datasteps of your data (or a subset) or dummy data that looks like your data but does not contain any sensitive information. You only need to include the variables needed.

Ask a Question
Discussion stats
  • 6 replies
  • 288 views
  • 1 like
  • 3 in conversation