Solved: Create a new column based on specific sitution

ybz12003 · Posted 05-02-2018 12:57 PM

Hello:

I have a template dataset which have 500 obs. The sample dataset is shown below.

data template; 
      infile datalines dlm='|'; 
  input Product : $20. Class : $50.; 
datalines;                  
BENADRYL |  ETHANOLAMINE |   
UNISOM TABLETS |  ETHANOLAMINE |   
PHENERGAN |  PHENOTHIAZINE |  
PROMETHAZINE |  PHENOTHIAZINE | 
KEFLEX |  CEPH/FIRST GENERATION |   
NOVOCAINE |  LOCAL ANESTHETICS |   
PREDNISONE |  ADRENALS |  
PULMICORT |  ADRENALS |  
RHINOCORT |  ADRENALS | 
; 

Proc sort; by class; run;

And I have a dataset with 10K obs, below is only my sample dataset.

data have; 
  format ShipDate mmddyy10.;
  infile datalines dlm='|'; 
  input ID $10. Product : $20.  shipdate : mmddyy10.; 
datalines;                  
100012556 | BENADRYL |  01/08/2010 |   
600012956 | UNISOM TABLETS |  09/02/2017 |   
500012586 | PHENERGAN |  06/24/2006 |  
500052586 | PROMETHAZINE |  05/03/2007 | 
200052576 | KEFLEX |  07/31/2003 |   
600052577 | NOVOCAINE |  03/25/2001 |   
900052578 | PREDNISONE |  06/17/2018 |  
800012578 | PULMICORT |  09/12/2002 |  
600012878 | RHINOCORT |  12/14/2016 | 
;

So I would like to create a new "Class" column in "Have" dataset based on the "Template" dataset condition. Below is the result I want. Please let me know how to approach this, thanks.

data want; 
	format ShipDate mmddyy10.;
    infile datalines dlm='|'; 
    input ID $10. Product : $20. Class : $50. shipdate : mmddyy10.; 
datalines;                  
100012556 | BENADRYL |  ETHANOLAMINE |  01/08/2010 
600012956 | UNISOM TABLETS |  ETHANOLAMINE | 09/02/2017  
500012586 | PHENERGAN |  PHENOTHIAZINE | 06/24/2006 
500052586 | PROMETHAZINE |  PHENOTHIAZINE | 05/03/2007
200052576 | KEFLEX |  CEPH/FIRST GENERATION | 07/31/2003  
600052577 | NOVOCAINE |  LOCAL ANESTHETICS |  03/25/2001 
900052578 | PREDNISONE |  ADRENALS | 06/17/2018 
800012578 | PULMICORT |  ADRENALS |  09/12/2002
600012878 | RHINOCORT |  ADRENALS | 12/14/2016
; 

Proc sort; by class; run;

Astounding · Posted 05-02-2018 01:22 PM

SAS has many ways of combining tables, based on finding a match. Which are you most comfortable with? DATA step MERGE? SQL join? Formats? Hashing? Pick an approach you understand, so you can build on what you already know.

View solution in original post

Astounding · Posted 05-02-2018 01:22 PM

SAS has many ways of combining tables, based on finding a match. Which are you most comfortable with? DATA step MERGE? SQL join? Formats? Hashing? Pick an approach you understand, so you can build on what you already know.

ybz12003 · Posted 05-02-2018 01:24 PM

SQL join

Kurt_Bremser · Posted 05-02-2018 01:27 PM

With just 500 obs, a format is the way to go. No sorting needed.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Reeza · Posted 05-02-2018 01:30 PM

@ybz12003 From your previous questions and posts you should be able to do a SQL join. Can you please post what you've tried and the log indicating any errors you're getting or a descriptions of what the issue you're having?

Reeza · Posted 05-02-2018 01:23 PM

This is a pretty straight look up or join/merge on the names. Or a format.

Which approach are you looking to use, and what have you tried so far?

@ybz12003 wrote:

Hello:

I have a template dataset which have 500 obs. The sample dataset is shown below.

data template; 
      infile datalines dlm='|'; 
  input Product : $20. Class : $50.; 
datalines;                  
BENADRYL |  ETHANOLAMINE |   
UNISOM TABLETS |  ETHANOLAMINE |   
PHENERGAN |  PHENOTHIAZINE |  
PROMETHAZINE |  PHENOTHIAZINE | 
KEFLEX |  CEPH/FIRST GENERATION |   
NOVOCAINE |  LOCAL ANESTHETICS |   
PREDNISONE |  ADRENALS |  
PULMICORT |  ADRENALS |  
RHINOCORT |  ADRENALS | 
; 

Proc sort; by class; run;

And I have a dataset with 10K obs, below is only my sample dataset.

data have; 
  format ShipDate mmddyy10.;
  infile datalines dlm='|'; 
  input ID $10. Product : $20.  shipdate : mmddyy10.; 
datalines;                  
100012556 | BENADRYL |  01/08/2010 |   
600012956 | UNISOM TABLETS |  09/02/2017 |   
500012586 | PHENERGAN |  06/24/2006 |  
500052586 | PROMETHAZINE |  05/03/2007 | 
200052576 | KEFLEX |  07/31/2003 |   
600052577 | NOVOCAINE |  03/25/2001 |   
900052578 | PREDNISONE |  06/17/2018 |  
800012578 | PULMICORT |  09/12/2002 |  
600012878 | RHINOCORT |  12/14/2016 | 
;

So I would like to create a new "Class" column in "Have" dataset based on the "Template" dataset condition. Below is the result I want. Please let me know how to approach this, thanks.

data want; 
	format ShipDate mmddyy10.;
    infile datalines dlm='|'; 
    input ID $10. Product : $20. Class : $50. shipdate : mmddyy10.; 
datalines;                  
100012556 | BENADRYL |  ETHANOLAMINE |  01/08/2010 
600012956 | UNISOM TABLETS |  ETHANOLAMINE | 09/02/2017  
500012586 | PHENERGAN |  PHENOTHIAZINE | 06/24/2006 
500052586 | PROMETHAZINE |  PHENOTHIAZINE | 05/03/2007
200052576 | KEFLEX |  CEPH/FIRST GENERATION | 07/31/2003  
600052577 | NOVOCAINE |  LOCAL ANESTHETICS |  03/25/2001 
900052578 | PREDNISONE |  ADRENALS | 06/17/2018 
800012578 | PULMICORT |  ADRENALS |  09/12/2002
600012878 | RHINOCORT |  ADRENALS | 12/14/2016
; 

Proc sort; by class; run;

ybz12003 · Posted 05-02-2018 01:32 PM

One important thing I would like to point out is the actual template data is 500 obs, while the actual dataset is s10K which is 20 time larger than the template data. Thanks.

Kurt_Bremser · Posted 05-02-2018 01:37 PM

@ybz12003 wrote:

One important thing I would like to point out is the actual template data is 500 obs, while the actual dataset is s10K which is 20 time larger than the template data. Thanks.

And? 500 obs are still 500 obs and can very easily be handled with a format.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Reeza · Posted 05-02-2018 01:38 PM

Even if it was a million in this case it wouldn't really affect the code or results and probably not even the timing. We're talking about less than a minute for most of this on any decent machine (8GB RAM).

@ybz12003 wrote:

One important thing I would like to point out is the actual template data is 500 obs, while the actual dataset is s10K which is 20 time larger than the template data. Thanks.

Create a new column based on specific sitution

Re: Create a new column based on specific sitution

Re: Create a new column based on specific sitution

Re: Create a new column based on specific sitution

Re: Create a new column based on specific sitution

Re: Create a new column based on specific sitution

Re: Create a new column based on specific sitution

Re: Create a new column based on specific sitution

Re: Create a new column based on specific sitution

Re: Create a new column based on specific sitution

Registration is open

SAS Training: Just a Click Away