Hello:
I have a template dataset which have 500 obs. The sample dataset is shown below.
data template;
infile datalines dlm='|';
input Product : $20. Class : $50.;
datalines;
BENADRYL | ETHANOLAMINE |
UNISOM TABLETS | ETHANOLAMINE |
PHENERGAN | PHENOTHIAZINE |
PROMETHAZINE | PHENOTHIAZINE |
KEFLEX | CEPH/FIRST GENERATION |
NOVOCAINE | LOCAL ANESTHETICS |
PREDNISONE | ADRENALS |
PULMICORT | ADRENALS |
RHINOCORT | ADRENALS |
;
Proc sort; by class; run;
And I have a dataset with 10K obs, below is only my sample dataset.
data have;
format ShipDate mmddyy10.;
infile datalines dlm='|';
input ID $10. Product : $20. shipdate : mmddyy10.;
datalines;
100012556 | BENADRYL | 01/08/2010 |
600012956 | UNISOM TABLETS | 09/02/2017 |
500012586 | PHENERGAN | 06/24/2006 |
500052586 | PROMETHAZINE | 05/03/2007 |
200052576 | KEFLEX | 07/31/2003 |
600052577 | NOVOCAINE | 03/25/2001 |
900052578 | PREDNISONE | 06/17/2018 |
800012578 | PULMICORT | 09/12/2002 |
600012878 | RHINOCORT | 12/14/2016 |
;
So I would like to create a new "Class" column in "Have" dataset based on the "Template" dataset condition. Below is the result I want. Please let me know how to approach this, thanks.
data want;
format ShipDate mmddyy10.;
infile datalines dlm='|';
input ID $10. Product : $20. Class : $50. shipdate : mmddyy10.;
datalines;
100012556 | BENADRYL | ETHANOLAMINE | 01/08/2010
600012956 | UNISOM TABLETS | ETHANOLAMINE | 09/02/2017
500012586 | PHENERGAN | PHENOTHIAZINE | 06/24/2006
500052586 | PROMETHAZINE | PHENOTHIAZINE | 05/03/2007
200052576 | KEFLEX | CEPH/FIRST GENERATION | 07/31/2003
600052577 | NOVOCAINE | LOCAL ANESTHETICS | 03/25/2001
900052578 | PREDNISONE | ADRENALS | 06/17/2018
800012578 | PULMICORT | ADRENALS | 09/12/2002
600012878 | RHINOCORT | ADRENALS | 12/14/2016
;
Proc sort; by class; run;
SAS has many ways of combining tables, based on finding a match. Which are you most comfortable with? DATA step MERGE? SQL join? Formats? Hashing? Pick an approach you understand, so you can build on what you already know.
SAS has many ways of combining tables, based on finding a match. Which are you most comfortable with? DATA step MERGE? SQL join? Formats? Hashing? Pick an approach you understand, so you can build on what you already know.
With just 500 obs, a format is the way to go. No sorting needed.
@ybz12003 From your previous questions and posts you should be able to do a SQL join. Can you please post what you've tried and the log indicating any errors you're getting or a descriptions of what the issue you're having?
This is a pretty straight look up or join/merge on the names. Or a format.
Which approach are you looking to use, and what have you tried so far?
@ybz12003 wrote:
Hello:
I have a template dataset which have 500 obs. The sample dataset is shown below.
data template; infile datalines dlm='|'; input Product : $20. Class : $50.; datalines; BENADRYL | ETHANOLAMINE | UNISOM TABLETS | ETHANOLAMINE | PHENERGAN | PHENOTHIAZINE | PROMETHAZINE | PHENOTHIAZINE | KEFLEX | CEPH/FIRST GENERATION | NOVOCAINE | LOCAL ANESTHETICS | PREDNISONE | ADRENALS | PULMICORT | ADRENALS | RHINOCORT | ADRENALS | ; Proc sort; by class; run;
And I have a dataset with 10K obs, below is only my sample dataset.
data have; format ShipDate mmddyy10.; infile datalines dlm='|'; input ID $10. Product : $20. shipdate : mmddyy10.; datalines; 100012556 | BENADRYL | 01/08/2010 | 600012956 | UNISOM TABLETS | 09/02/2017 | 500012586 | PHENERGAN | 06/24/2006 | 500052586 | PROMETHAZINE | 05/03/2007 | 200052576 | KEFLEX | 07/31/2003 | 600052577 | NOVOCAINE | 03/25/2001 | 900052578 | PREDNISONE | 06/17/2018 | 800012578 | PULMICORT | 09/12/2002 | 600012878 | RHINOCORT | 12/14/2016 | ;
So I would like to create a new "Class" column in "Have" dataset based on the "Template" dataset condition. Below is the result I want. Please let me know how to approach this, thanks.
data want; format ShipDate mmddyy10.; infile datalines dlm='|'; input ID $10. Product : $20. Class : $50. shipdate : mmddyy10.; datalines; 100012556 | BENADRYL | ETHANOLAMINE | 01/08/2010 600012956 | UNISOM TABLETS | ETHANOLAMINE | 09/02/2017 500012586 | PHENERGAN | PHENOTHIAZINE | 06/24/2006 500052586 | PROMETHAZINE | PHENOTHIAZINE | 05/03/2007 200052576 | KEFLEX | CEPH/FIRST GENERATION | 07/31/2003 600052577 | NOVOCAINE | LOCAL ANESTHETICS | 03/25/2001 900052578 | PREDNISONE | ADRENALS | 06/17/2018 800012578 | PULMICORT | ADRENALS | 09/12/2002 600012878 | RHINOCORT | ADRENALS | 12/14/2016 ; Proc sort; by class; run;
One important thing I would like to point out is the actual template data is 500 obs, while the actual dataset is s10K which is 20 time larger than the template data. Thanks.
@ybz12003 wrote:
One important thing I would like to point out is the actual template data is 500 obs, while the actual dataset is s10K which is 20 time larger than the template data. Thanks.
And? 500 obs are still 500 obs and can very easily be handled with a format.
Even if it was a million in this case it wouldn't really affect the code or results and probably not even the timing. We're talking about less than a minute for most of this on any decent machine (8GB RAM).
@ybz12003 wrote:
One important thing I would like to point out is the actual template data is 500 obs, while the actual dataset is s10K which is 20 time larger than the template data. Thanks.
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.