BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ybz12003
Rhodochrosite | Level 12

Hello:

 

I have a template dataset which have 500 obs.  The sample dataset is shown below.

 

 

data template; 
      infile datalines dlm='|'; 
  input Product : $20. Class : $50.; 
datalines;                  
BENADRYL |  ETHANOLAMINE |   
UNISOM TABLETS |  ETHANOLAMINE |   
PHENERGAN |  PHENOTHIAZINE |  
PROMETHAZINE |  PHENOTHIAZINE | 
KEFLEX |  CEPH/FIRST GENERATION |   
NOVOCAINE |  LOCAL ANESTHETICS |   
PREDNISONE |  ADRENALS |  
PULMICORT |  ADRENALS |  
RHINOCORT |  ADRENALS | 
; 

Proc sort; by class; run;

 

And I have a dataset with 10K obs,  below is only  my sample dataset.

 

data have; 
  format ShipDate mmddyy10.;
  infile datalines dlm='|'; 
  input ID $10. Product : $20.  shipdate : mmddyy10.; 
datalines;                  
100012556 | BENADRYL |  01/08/2010 |   
600012956 | UNISOM TABLETS |  09/02/2017 |   
500012586 | PHENERGAN |  06/24/2006 |  
500052586 | PROMETHAZINE |  05/03/2007 | 
200052576 | KEFLEX |  07/31/2003 |   
600052577 | NOVOCAINE |  03/25/2001 |   
900052578 | PREDNISONE |  06/17/2018 |  
800012578 | PULMICORT |  09/12/2002 |  
600012878 | RHINOCORT |  12/14/2016 | 
; 

 

So I would like to create a new "Class" column in "Have" dataset based on the "Template" dataset condition.   Below is the result I want.   Please let me know how to approach this, thanks.

 

data want; 
	format ShipDate mmddyy10.;
    infile datalines dlm='|'; 
    input ID $10. Product : $20. Class : $50. shipdate : mmddyy10.; 
datalines;                  
100012556 | BENADRYL |  ETHANOLAMINE |  01/08/2010 
600012956 | UNISOM TABLETS |  ETHANOLAMINE | 09/02/2017  
500012586 | PHENERGAN |  PHENOTHIAZINE | 06/24/2006 
500052586 | PROMETHAZINE |  PHENOTHIAZINE | 05/03/2007
200052576 | KEFLEX |  CEPH/FIRST GENERATION | 07/31/2003  
600052577 | NOVOCAINE |  LOCAL ANESTHETICS |  03/25/2001 
900052578 | PREDNISONE |  ADRENALS | 06/17/2018 
800012578 | PULMICORT |  ADRENALS |  09/12/2002
600012878 | RHINOCORT |  ADRENALS | 12/14/2016
; 

Proc sort; by class; run;

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
Opal | Level 21

SAS has many ways of combining tables, based on finding a match.  Which are you most comfortable with?  DATA step MERGE?  SQL join?  Formats?  Hashing?  Pick an approach you understand, so you can build on what you already know.

View solution in original post

8 REPLIES 8
Astounding
Opal | Level 21

SAS has many ways of combining tables, based on finding a match.  Which are you most comfortable with?  DATA step MERGE?  SQL join?  Formats?  Hashing?  Pick an approach you understand, so you can build on what you already know.

ybz12003
Rhodochrosite | Level 12
SQL join
Reeza
Super User

@ybz12003 From your previous questions and posts you should be able to do a SQL join. Can you please post what you've tried and the log indicating any errors you're getting or a descriptions of what the issue you're having?

Reeza
Super User

This is a pretty straight look up or join/merge on the names. Or a format. 

 

Which approach are you looking to use, and what have you tried so far?

 


@ybz12003 wrote:

Hello:

 

I have a template dataset which have 500 obs.  The sample dataset is shown below.

 

 

data template; 
      infile datalines dlm='|'; 
  input Product : $20. Class : $50.; 
datalines;                  
BENADRYL |  ETHANOLAMINE |   
UNISOM TABLETS |  ETHANOLAMINE |   
PHENERGAN |  PHENOTHIAZINE |  
PROMETHAZINE |  PHENOTHIAZINE | 
KEFLEX |  CEPH/FIRST GENERATION |   
NOVOCAINE |  LOCAL ANESTHETICS |   
PREDNISONE |  ADRENALS |  
PULMICORT |  ADRENALS |  
RHINOCORT |  ADRENALS | 
; 

Proc sort; by class; run;

 

And I have a dataset with 10K obs,  below is only  my sample dataset.

 

data have; 
  format ShipDate mmddyy10.;
  infile datalines dlm='|'; 
  input ID $10. Product : $20.  shipdate : mmddyy10.; 
datalines;                  
100012556 | BENADRYL |  01/08/2010 |   
600012956 | UNISOM TABLETS |  09/02/2017 |   
500012586 | PHENERGAN |  06/24/2006 |  
500052586 | PROMETHAZINE |  05/03/2007 | 
200052576 | KEFLEX |  07/31/2003 |   
600052577 | NOVOCAINE |  03/25/2001 |   
900052578 | PREDNISONE |  06/17/2018 |  
800012578 | PULMICORT |  09/12/2002 |  
600012878 | RHINOCORT |  12/14/2016 | 
; 

 

So I would like to create a new "Class" column in "Have" dataset based on the "Template" dataset condition.   Below is the result I want.   Please let me know how to approach this, thanks.

 

data want; 
	format ShipDate mmddyy10.;
    infile datalines dlm='|'; 
    input ID $10. Product : $20. Class : $50. shipdate : mmddyy10.; 
datalines;                  
100012556 | BENADRYL |  ETHANOLAMINE |  01/08/2010 
600012956 | UNISOM TABLETS |  ETHANOLAMINE | 09/02/2017  
500012586 | PHENERGAN |  PHENOTHIAZINE | 06/24/2006 
500052586 | PROMETHAZINE |  PHENOTHIAZINE | 05/03/2007
200052576 | KEFLEX |  CEPH/FIRST GENERATION | 07/31/2003  
600052577 | NOVOCAINE |  LOCAL ANESTHETICS |  03/25/2001 
900052578 | PREDNISONE |  ADRENALS | 06/17/2018 
800012578 | PULMICORT |  ADRENALS |  09/12/2002
600012878 | RHINOCORT |  ADRENALS | 12/14/2016
; 

Proc sort; by class; run;

 

 


 

ybz12003
Rhodochrosite | Level 12

One important thing I would like to point out is the actual template data is 500 obs, while the actual dataset is s10K which is 20 time larger than the template data.   Thanks.

Kurt_Bremser
Super User

@ybz12003 wrote:

One important thing I would like to point out is the actual template data is 500 obs, while the actual dataset is s10K which is 20 time larger than the template data.   Thanks.


And? 500 obs are still 500 obs and can very easily be handled with a format.

Reeza
Super User

Even if it was a million in this case it wouldn't really affect the code or results and probably not even the timing. We're talking about less than a minute for most of this on any decent machine (8GB RAM).

 


@ybz12003 wrote:

One important thing I would like to point out is the actual template data is 500 obs, while the actual dataset is s10K which is 20 time larger than the template data.   Thanks.


 

SAS INNOVATE 2024

Innovate_SAS_Blue.png

Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Get the $99 certification deal.jpg

 

 

Back in the Classroom!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 635 views
  • 2 likes
  • 4 in conversation