BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
robertrao
Quartz | Level 8

PGstats helped me with a solution in Sql. Could anyone help me writing it in Macro or other basic steps??

I have an Excel spread sheet(X) and has 3 tabs in it(A B C) which are to be imported into 3 datasets in SAS.

The 3 datasets created will be our master datasets.

We need to update latest information from Excel to our master datasets

Our Excel sheet will have previous information too, BUT ONLY THE CURRENT QUARTER INFORMATION NEEDS TO BE APPENDED TO OUR MASTER DATASETS..

Could someone help me build the code in Sas 9.2.

Thanks in advance

1 ACCEPTED SOLUTION

Accepted Solutions
art297
Opal | Level 21

Karun,

It is probably easiest to answer your questions with an example that only uses sas datasets so that you can see what the data looks like in a data step.

First, here are two datasteps that create files which I think are like the ones you have, other than that the second one is really a tab within an Excel workbook:

data myDatasetA;

  informat myDate yyq6.;

  format myDate yyq6.;

  input myDate x;

  cards;

2010Q1 1

2010Q2 1

2010Q3 1

2010Q4 1

;

run;

data XL_A;

  informat myDate yyq6.;

  format myDate yyq6.;

  input myDate x;

  cards;

2010Q1 2

2010Q3 2

2010Q4 2

2011Q1 2

2011Q2 2

2011Q3 2

2011Q4 2

;

run;

/*next, here is the code, but modified to work with 2 sas datasets, rather than expecting one to be from Excel */

data want (drop=maxdate);

  retain maxdate;

  set myDatasetA (in=inA) XL_A;

  if inA then maxdate=max(maxdate,myDate);

  else if myDate gt maxdate;

run;

/* the resulting file will look like:

         myDate    x

         2010Q1    1

         2010Q2    1

         2010Q3    1

         2010Q4    1

         2011Q1    2

         2011Q2    2

         2011Q3    2

         2011Q4    2

Thus, to answer your questions:

1. The data in myDatasetA doesn't get changed .. only new records are added to it

2. Before file XL_A is even read, maxdate was set to equal the maximum date present in the file myDatasetA.  It was retained so that it would be available at that time when the records from XL_A were read

3. The format of myDate shouldn't matter as long as it was input correctly to represent a SAS date

4. Yes, you appear to understand the first part.  The second part is a SAS statement that selects records from XL_A if, and only if, their value for myDate is greater than maxdate.  An easy way to see that work is with another example.  e.g.:

*/

data test;

set sashelp.class;

if sex eq "M";

run;

/* That code will ONLY select records that have a value of "M" for the variable called sex */

View solution in original post

25 REPLIES 25
art297
Opal | Level 21

Replicating PG's logic in a datastep:

libname XL Excel "MyExcelFile.xlsx";

data want (drop=maxdate);

  retain maxdate;

  set myLib.myDatasetA (in=inA) XL.'A$'n;

  if inA then maxdate=max(maxdate,myDate);

  else if myDate gt maxdate;

run;

PGStats
Opal | Level 21

Nice and compact! I think the logic calls for maxdate = max(myDate, maxdate) instead of maxdate=max(myDate, lag(myDate)) which would only give you the maximum of the two last dates in the dataset. - PG

PG
art297
Opal | Level 21

PG: Agreed and modified the code accordingly.  It worked on my sample data as it was in date order.

robertrao
Quartz | Level 8

Hi PGstats,

Thanks for making corrections to Arthurs code.

Since I am a novice, I have some difficulty understanding the code. Please correct me if i am wrong in understanding it!

1)we are reading our Excel file with the libname statement.

2)data want (drop=maxdate);

what is data want?why are we dropping maxdate??????and again retaining it????i dont want to create another dataset. just wanted to update

datasetA datasetB and datasetC from EXCEL (excel is updated every quarter)!!!

3)set myLib.myDatasetA (in=inA) XL.'A$'n;

To the DatasetA present in our library we are setting the information present in our Excel Tab1;(n specifies the tab)!!!

4) if inA maxdate = max(myDate, maxdate)

else if myDate gt maxdate;

What is this final step doing>??????????

THANKS A TON

robertrao
Quartz | Level 8

Hi Arthur,

Firstly Thanks for your effort in putting up the code.

Since I am a novice, I have some difficulty understanding the code. Please correct me if i am wrong in understanding it!

1)we are reading our Excel file with the libname statement.

2)data want (drop=maxdate);

what is data want?why are we dropping maxdate??????and again retaining it????i dont want to create another dataset. just wanted to update

datasetA datasetB and datasetC from EXCEL (excel is updated every quarter)!!!

3)set myLib.myDatasetA (in=inA) XL.'A$'n;

To the DatasetA present in our library we are setting the information present in our Excel Tab1;(n specifies the tab)!!!

4) if inA maxdate = max(myDate, maxdate)

else if myDate gt maxdate;

What is this final step doing>??????????

THANKS A TON

art297
Opal | Level 21

The code was intended to mimic what PG's SQL code had done:

1. Yes, we are reading the Excel file with the libname statement

2. The datastep is creating a file called "want".  You can give it whatever name you want.  For examples, I usually call the input data 'have' and the output file 'want'.  Maxdate was dropped as I didn't think you needed it beyond using it to determine which Excel rows you wanted to add.  The code was only written to update one dataset based on a tab within an Excel workbook.  Of course, it could easily be expanded to deal with multiple tabs and/or sas dataset updates.

Maxdate is retained so that its value holds, from one record to the next, unless the program changes it.  That way, the value continues to be available while you are reading the data from the workbook.

3. In PG's code he assumed a tab labeled 'A' which, written as a name literal (in his code) was expressed as 'A$'n

4. The program is first reading and rewriting all of the records that are in the current sas dataset and, by the time it gets to the last record, having determined what the latest date was that was in the file.  Then, when it reads the tab in the workbook, it ONLY selects those rows that have a value for myDate that is more recent than maxdate.

Does that answer all of your questions?

robertrao
Quartz | Level 8

Dear Arthur,

Some more questions

1)Apart from the latest data being updated in myLib.myDatasetA the same data  is seen in the WANT dataset we are creating here?

2)if mydate is the variable containing the dates in the dataset? then which part of ouir code is comparing our dates in the dataset with the dates present in Excel????

3)if my date is in the format 2011Q3 and so on this code will still hold good right?

4)if inA then maxdate=max(maxdate,myDate);   /*holding the max date in our dataset*/

  else if myDate gt maxdate;                              /*what is this step doing? "else if myDate gt maxdate" then what should it do? is there  anything missing here*/

art297
Opal | Level 21

Karun,

It is probably easiest to answer your questions with an example that only uses sas datasets so that you can see what the data looks like in a data step.

First, here are two datasteps that create files which I think are like the ones you have, other than that the second one is really a tab within an Excel workbook:

data myDatasetA;

  informat myDate yyq6.;

  format myDate yyq6.;

  input myDate x;

  cards;

2010Q1 1

2010Q2 1

2010Q3 1

2010Q4 1

;

run;

data XL_A;

  informat myDate yyq6.;

  format myDate yyq6.;

  input myDate x;

  cards;

2010Q1 2

2010Q3 2

2010Q4 2

2011Q1 2

2011Q2 2

2011Q3 2

2011Q4 2

;

run;

/*next, here is the code, but modified to work with 2 sas datasets, rather than expecting one to be from Excel */

data want (drop=maxdate);

  retain maxdate;

  set myDatasetA (in=inA) XL_A;

  if inA then maxdate=max(maxdate,myDate);

  else if myDate gt maxdate;

run;

/* the resulting file will look like:

         myDate    x

         2010Q1    1

         2010Q2    1

         2010Q3    1

         2010Q4    1

         2011Q1    2

         2011Q2    2

         2011Q3    2

         2011Q4    2

Thus, to answer your questions:

1. The data in myDatasetA doesn't get changed .. only new records are added to it

2. Before file XL_A is even read, maxdate was set to equal the maximum date present in the file myDatasetA.  It was retained so that it would be available at that time when the records from XL_A were read

3. The format of myDate shouldn't matter as long as it was input correctly to represent a SAS date

4. Yes, you appear to understand the first part.  The second part is a SAS statement that selects records from XL_A if, and only if, their value for myDate is greater than maxdate.  An easy way to see that work is with another example.  e.g.:

*/

data test;

set sashelp.class;

if sex eq "M";

run;

/* That code will ONLY select records that have a value of "M" for the variable called sex */

robertrao
Quartz | Level 8

WOW ARTHUR,

Thanks for your patience and Time. PGstats  and you deserve a Lunch. I am going to save this example and explanation for the future so that I will not take your valuable time for the same task again.

QUESTION: I am confused also on to whoi  among u both should be given a RIGHT ANSWER!!! There should have been a BOTH OPTION FOR THAT........

WeLL Done

art297
Opal | Level 21

Karun:  Yes, it would be nice to be able to assign multiple correct answers, but it is more important that you assign at least one of them as being correct so that others won't use up their time trying to answer a question they don't think has been answered yet.

You can assign one correct answer and, I think, up to 3 other posts as being helpful.  Assigning those credits serves two purposes: (1) it makes it easier for people to find the correct answer when they have a similar question and (2) it is the only reward we users get for helping other users.  Not much of an award as it can't be used for anything, but at least it does seem to motivate people to answer posts.  FWIW, a correct answer gets 4 points and each helpful answer gets 3 points.

robertrao
Quartz | Level 8

Hi Arthur,

The code u built keeps updating the values from Excel to our dataset BUT WITH A DIFFERENT NAME(WANT).

WANT has the information now.

Is there a way to do the same process without creating WANT dataset?????????

i want data from Excel to be added to myDatasetA which is already present in a particular library?????

Regards

robertrao
Quartz | Level 8

*WILL THIS WORK?

changed want to data myLib.myDatasetA??????????????

libname XL Excel "MyExcelFile.xlsx";

data myLib.myDatasetA (drop=maxdate);

  retain maxdate;

  set myLib.myDatasetA (in=inA) XL.'A$'n;

  if inA then maxdate=max(maxdate,myDate);

  else if myDate gt maxdate;

run;

art297
Opal | Level 21

Karun,  Yes!  As far as I can tell, that is precisely how it should look.

However, just to play it safe, you may want to make a copy of mylib.myDatasetA, before running the code.  The code will physically modify mylib.myDatasetA and, if it isn't exactly how you want it, without having a backup copy you will be totally out of luck.

Have I ever mentioned the importance of maintaining backup copies of one's datasets?

robertrao
Quartz | Level 8

Thanks. I recently joined this community. I dint have a chance to hear about the backup maintainance

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 25 replies
  • 2205 views
  • 6 likes
  • 4 in conversation