BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ybz12003
Rhodochrosite | Level 12

Hello:

 

I have a template dataset which have 500 obs.  The sample dataset is shown below.

 

 

data template; 
      infile datalines dlm='|'; 
  input Product : $20. Class : $50.; 
datalines;                  
BENADRYL |  ETHANOLAMINE |   
UNISOM TABLETS |  ETHANOLAMINE |   
PHENERGAN |  PHENOTHIAZINE |  
PROMETHAZINE |  PHENOTHIAZINE | 
KEFLEX |  CEPH/FIRST GENERATION |   
NOVOCAINE |  LOCAL ANESTHETICS |   
PREDNISONE |  ADRENALS |  
PULMICORT |  ADRENALS |  
RHINOCORT |  ADRENALS | 
; 

Proc sort; by class; run;

 

And I have a dataset with 10K obs,  below is only  my sample dataset.

 

data have; 
  format ShipDate mmddyy10.;
  infile datalines dlm='|'; 
  input ID $10. Product : $20.  shipdate : mmddyy10.; 
datalines;                  
100012556 | BENADRYL |  01/08/2010 |   
600012956 | UNISOM TABLETS |  09/02/2017 |   
500012586 | PHENERGAN |  06/24/2006 |  
500052586 | PROMETHAZINE |  05/03/2007 | 
200052576 | KEFLEX |  07/31/2003 |   
600052577 | NOVOCAINE |  03/25/2001 |   
900052578 | PREDNISONE |  06/17/2018 |  
800012578 | PULMICORT |  09/12/2002 |  
600012878 | RHINOCORT |  12/14/2016 | 
; 

 

So I would like to create a new "Class" column in "Have" dataset based on the "Template" dataset condition.   Below is the result I want.   Please let me know how to approach this, thanks.

 

data want; 
	format ShipDate mmddyy10.;
    infile datalines dlm='|'; 
    input ID $10. Product : $20. Class : $50. shipdate : mmddyy10.; 
datalines;                  
100012556 | BENADRYL |  ETHANOLAMINE |  01/08/2010 
600012956 | UNISOM TABLETS |  ETHANOLAMINE | 09/02/2017  
500012586 | PHENERGAN |  PHENOTHIAZINE | 06/24/2006 
500052586 | PROMETHAZINE |  PHENOTHIAZINE | 05/03/2007
200052576 | KEFLEX |  CEPH/FIRST GENERATION | 07/31/2003  
600052577 | NOVOCAINE |  LOCAL ANESTHETICS |  03/25/2001 
900052578 | PREDNISONE |  ADRENALS | 06/17/2018 
800012578 | PULMICORT |  ADRENALS |  09/12/2002
600012878 | RHINOCORT |  ADRENALS | 12/14/2016
; 

Proc sort; by class; run;

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

SAS has many ways of combining tables, based on finding a match.  Which are you most comfortable with?  DATA step MERGE?  SQL join?  Formats?  Hashing?  Pick an approach you understand, so you can build on what you already know.

View solution in original post

8 REPLIES 8
Astounding
PROC Star

SAS has many ways of combining tables, based on finding a match.  Which are you most comfortable with?  DATA step MERGE?  SQL join?  Formats?  Hashing?  Pick an approach you understand, so you can build on what you already know.

ybz12003
Rhodochrosite | Level 12
SQL join
Reeza
Super User

@ybz12003 From your previous questions and posts you should be able to do a SQL join. Can you please post what you've tried and the log indicating any errors you're getting or a descriptions of what the issue you're having?

Reeza
Super User

This is a pretty straight look up or join/merge on the names. Or a format. 

 

Which approach are you looking to use, and what have you tried so far?

 


@ybz12003 wrote:

Hello:

 

I have a template dataset which have 500 obs.  The sample dataset is shown below.

 

 

data template; 
      infile datalines dlm='|'; 
  input Product : $20. Class : $50.; 
datalines;                  
BENADRYL |  ETHANOLAMINE |   
UNISOM TABLETS |  ETHANOLAMINE |   
PHENERGAN |  PHENOTHIAZINE |  
PROMETHAZINE |  PHENOTHIAZINE | 
KEFLEX |  CEPH/FIRST GENERATION |   
NOVOCAINE |  LOCAL ANESTHETICS |   
PREDNISONE |  ADRENALS |  
PULMICORT |  ADRENALS |  
RHINOCORT |  ADRENALS | 
; 

Proc sort; by class; run;

 

And I have a dataset with 10K obs,  below is only  my sample dataset.

 

data have; 
  format ShipDate mmddyy10.;
  infile datalines dlm='|'; 
  input ID $10. Product : $20.  shipdate : mmddyy10.; 
datalines;                  
100012556 | BENADRYL |  01/08/2010 |   
600012956 | UNISOM TABLETS |  09/02/2017 |   
500012586 | PHENERGAN |  06/24/2006 |  
500052586 | PROMETHAZINE |  05/03/2007 | 
200052576 | KEFLEX |  07/31/2003 |   
600052577 | NOVOCAINE |  03/25/2001 |   
900052578 | PREDNISONE |  06/17/2018 |  
800012578 | PULMICORT |  09/12/2002 |  
600012878 | RHINOCORT |  12/14/2016 | 
; 

 

So I would like to create a new "Class" column in "Have" dataset based on the "Template" dataset condition.   Below is the result I want.   Please let me know how to approach this, thanks.

 

data want; 
	format ShipDate mmddyy10.;
    infile datalines dlm='|'; 
    input ID $10. Product : $20. Class : $50. shipdate : mmddyy10.; 
datalines;                  
100012556 | BENADRYL |  ETHANOLAMINE |  01/08/2010 
600012956 | UNISOM TABLETS |  ETHANOLAMINE | 09/02/2017  
500012586 | PHENERGAN |  PHENOTHIAZINE | 06/24/2006 
500052586 | PROMETHAZINE |  PHENOTHIAZINE | 05/03/2007
200052576 | KEFLEX |  CEPH/FIRST GENERATION | 07/31/2003  
600052577 | NOVOCAINE |  LOCAL ANESTHETICS |  03/25/2001 
900052578 | PREDNISONE |  ADRENALS | 06/17/2018 
800012578 | PULMICORT |  ADRENALS |  09/12/2002
600012878 | RHINOCORT |  ADRENALS | 12/14/2016
; 

Proc sort; by class; run;

 

 


 

ybz12003
Rhodochrosite | Level 12

One important thing I would like to point out is the actual template data is 500 obs, while the actual dataset is s10K which is 20 time larger than the template data.   Thanks.

Kurt_Bremser
Super User

@ybz12003 wrote:

One important thing I would like to point out is the actual template data is 500 obs, while the actual dataset is s10K which is 20 time larger than the template data.   Thanks.


And? 500 obs are still 500 obs and can very easily be handled with a format.

Reeza
Super User

Even if it was a million in this case it wouldn't really affect the code or results and probably not even the timing. We're talking about less than a minute for most of this on any decent machine (8GB RAM).

 


@ybz12003 wrote:

One important thing I would like to point out is the actual template data is 500 obs, while the actual dataset is s10K which is 20 time larger than the template data.   Thanks.


 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 764 views
  • 2 likes
  • 4 in conversation