BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Sir_Highbury
Quartz | Level 8

Dear experts,

 

I would like to calculate the delta for all the numeric variable in 2 dataset with exactly the same structure.

 

Input data:

data have1;
input reference_period: $12. segment: $12. revenue: comma5. expenses: comma5.;
datalines;
jan low 1000 500
jan medium 1500 600
jan high 2000 800
;
data have2;
input reference_period: $12. segment: $12. revenue: comma5. expenses: comma5.;
datalines;
feb low 1100 550
feb medium 1400 650
feb high 2100 750
;
proc append base=have1 data=have2 ;run;

 

what I would like to get, in bold the part that should be automatized (i.e. it is just a semplification, I have a lot of variables to compare)

 

proc sql; create table want as select
"delta" as reference_period,
t0.segment,
(t0.revenue-t1.revenue) as revenue,
(t0.expenses-t1.expenses) as expenses
from have1 t0
inner join have1 t1 on t0.SEGMENT=t1.SEGMENT
where t0.reference_period in ("feb")
and t1.reference_period in ("jan")
;quit;

 

thanks in advance. BRs, SH

1 ACCEPTED SOLUTION
13 REPLIES 13
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Its not really clear to me from your given test data what you want.  Is there only two months, what about dec-jan?  Are they all in one dataset - if so then just post the one dataset with your have dataset.  Do the months always appear sequentially, and is the logic that it is current month merged to +1?  If so then the main problem you have is the "reference_period", just creating a proper date out of it with default parts, enable you to merge on intnx values:

data have;
  input reference_period: $12. segment: $12. revenue: comma5. expenses: comma5.;
  merge_dt=input(cats("01",upcase(reference_period),"2016"),date9.);
  format merge_dt date9.;
datalines;
jan low 1000 500
jan medium 1500 600
jan high 2000 800
feb low 1100 550
feb medium 1400 650
feb high 2100 750
;
run;

proc sql; 
  create table WANT as 
  select  "delta" as REFERENCE_PERIOD,
          T0.SEGMENT,
          (T0.REVENUE-T1.REVENUE) as REVENUE,
          (T0.EXPENSES-T1.EXPENSES) as EXPENSES
  from    HAVE T0
  left join HAVE T1
  on      intnx("month",T0.MERGE_DT,1)=T1.MERGE_DT
    and   T0.SEGMENT=T1.SEGMENT;
quit;
Sir_Highbury
Quartz | Level 8

dear RW9,

 

thanks for your attention, here some clarifications:

- I highlighted in bold what is my crucial point, I would like to automatize this part, not just for revenue and expenses but for n variables, my issue is not december! If jan and feb is misleading you can overwrite with peaches and lemons or x and y, the point is that I would like to subtract all the numeric variables of one data set to the ones of another one, assuming that both have an identical structure (exactly the same variables). E.g. like a matrix operation in python.

- It does not make a difference if I have two data set with the identical structure or two data sets with the identical structure appended with two different key to distinguish the two of them. You can consider your data set or what you got as have1 running my script as the starting point, as you wish. Still my point is not to write my have1 in a better way but how to get from it to want, i.e. to automatize what I highlighted in bold.

Thanks again for your support. BRs, SH.

 

 

 

 

 

 

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Ah, I see.  Then this is a question that comes up a lot.  Basically its a structural problem.  For some reason it is necessary to try to program with data where the data goes across the page.  This is not optimal from a programming point of view.  If you have ever had to do DB programming, or programming with a normalised structure you will understand.  For a programming taks you want a fixed data structure, so you know what columns there is, and then any amount of data elements (rows) to process.  So lets consider what your data is, ignore any output report requirement (as that is not the "programming interest").  In your data you have several numeric data items, which can be labeled per test.  So look at the structure:

REFERENCE_PERIOD   SEGMENT   PARAM      RESULT

jan                                   low               revenue     1000

jan                                   low               expenses   500

jan                                   medium        revenue     1500

...

 

What does the above give us, well my programming now does not need to know what each of the numeric column names are - thus we eliminate all the code associated with trying to find this out.  The merge does not need to change, its still ref-ref, seg-seg, but we now also add param-param on.  Then the "calculation" is simply base.result - compare.result.  This layout is far simpler to work with from a programming standpoint, and maintenance - well we can add or remove any number of paramters without any change to the code, so zero maintenance.  If you need the result for an output file, then a simple proc tranpose before the report step gives you the visual result you look for.  

Sir_Highbury
Quartz | Level 8

Dear Reeza,

 

I looked at the proc compare but I did not find anything that makes what I want, i.e. an ease calculation of the difference between two data base where the variables are numeric. 

Are you sure that the proc compare does it? Did you ever try it or are you gessing? I would like to be sure that playing with this proc compare is not a wast of time. Unfortunately due to some restriction I cannot integrate r in SAS and therefore I have to stick to the sas procedures.

Sir_Highbury
Quartz | Level 8

well... it seems that there is not anything better than this funny proc compare: I will try to figure out how. Either there are not better, more elegant and compact solutions or everyone ignore them.

Kurt_Bremser
Super User

If all numeric variables are to be compared:

data numericvars (keep=name);
set sashelp.vcolumn;
where upcase(libname) = "WORK" and upcase(memname) = "HAVE1" and type = 'num';
run;

data _null_;
set numericvars end = done;
if _n_ = 1 then call execute('
  proc sql;
  create table want as select
  "delta" as reference_period,
  t0.segment,
');
call execute('(t0.'!!name!!'-t1.'!!name!!') as '!!name!!' ');
if not done
then call execute(',');
else call execute("
  from have1 t0
  inner join have1 t1 on t0.SEGMENT=t1.SEGMENT
  where t0.reference_period in ('&month2')
  and t1.reference_period in ('&month1')
  ;
  quit;
");
run;
Sir_Highbury
Quartz | Level 8

Dear Kurt,

 

thanks for your suggestion, I think that you really got the poin but I still have some concerns due to the fact that you did not apply the code on the test case I provided and when I apply your code to the my test case:

 

data _null_;
set have1 end = done;
if _n_ = 1 then call execute('
proc sql;
create table want as select
"delta" as reference_period,
t0.segment,
');
call execute('(t0.'!!name!!'-t1.'!!name!!') as '!!name!!' ');
if not done
then call execute(',');
else call execute("
from have1 t0
inner join have1 t1 on t0.SEGMENT=t1.SEGMENT
where t0.reference_period in ('jan')
and t1.reference_period in ('feb')
;
quit;
");
run;

 

I did not get what I expect. Any suggestion?

Kurt_Bremser
Super User

Take a good look at my code. The first step creates a dataset with the names of the numeric variables in have1. The data _null_ then creates the proc sql from that. Running the data _null_ from have1 will not work.

ballardw
Super User

Post example input data and the expected results for that input.

This link https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn your data set(s), or subsets of the data, into data step code so we have some we can test.

 

You need to provide an example of what the expected output should be as posting code that doesn't work does not tell use what the actual output should.

 

Also be aware the Proc SQL is constantly reordering data so it may be difficult to do any operations that rely on the order of the records.

Sir_Highbury
Quartz | Level 8

again?! please refer to the data set have1 and the data set want of my initial post!

I got what I wanted and I did it on my own.. 😉

ballardw
Super User

Very good that you learned how to do what you wanted. It would be very helpful to others in future to post what you did to resolve your issue.

 

Without a link to the original post, data in another post is kind of pointless, especially since you don't even mention that this was a continuation of a previous topic. I have no desire to guess that data might exist in another post or spend time searching for it.

 

 

Reeza
Super User

@Sir_Highbury wrote:

again?! please refer to the data set have1 and the data set want of my initial post!

I got what I wanted and I did it on my own.. 😉


Your not very nice. Remember everyone on here is a volunteer. If you'd like to boss someone around hire a consultant or tutor. 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 4295 views
  • 4 likes
  • 5 in conversation