BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Ruslan
Calcite | Level 5

Hi All,

 

I have the following question which I think is relatively easy to solve, but I find a difficulty in writing a code. I have data that looks like:

 

Company        Date         Var1          Var2

   1                     1             10             0

   1                     1             15             1000

   1                     1             17             80

   1                     1             19             40

   1                     1             22             2000

   1                     1             25             100

   1                     2             9               10

   1                     2             15             200

   1                     2             16             100

   1                     2             20             500

   1                     2             22             900

   1                     2             23             400

   1                     2             26             250

   1                     3             11             100

   1                     3             13             80

   1                     3             18             0

   1                     3             20             0

   1                     3             22             0

   1                     3             24             0

   1                     3             26             420

   1                     3             28             5

   2                     1             20             300

   2                     1             22             200

   2                     1             25             0

   2                     2             18             2

   2                     2             21             350

   2                     2             25             40

   2                     2             27             70

 

So, for each company and for each date I have data on Var1 and Var2. I need to estimate new variable for each company and each date. Please find the estimation of the variable in word file attached.

 

To put it in words, I need to estimate value-weighted absolute deviation of Var1 around value-weighted Var1 for each date and each company. Weights are the fraction of each Var2 observation over the sum of all Var2 observations that are available for each date and each company. Then, obtained number has to be divided by value-weighted Var1.

 

Any help would be hugely appreciated. 

 

Kind regards,

Ruslan

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Thanks for your quick reply.

 

Here's the modified data step solution:

data want;
do until(last.date);
  set have;
  by company date;
  s2=sum(s2,var2);
end;
do until(last.date);
  set have;
  by company date;
  s1=sum(s1,var2/s2*var1);
end;
do until(last.date);
  set have;
  by company date;
  s3=sum(s3,var2/s2*abs(var1-s1));
  m2=sum(m2,(var2>0));
end;
Var3=s3/s1;
if m2<4 then var3=.;
keep Company Date Var3;
run;

For completeness, please find below the modified PROC SQL solution:

proc sql;
create table want as
select company, date, case when sum(var2>0)>=4 then sum(var2/s2*abs(var1-s1))/s1
                                               else .
                      end as Var3
from (select *, 
      sum(var2/s2*var1) as s1
      from (select *, sum(var2) as s2
            from have
            group by company, date)
      group by company, date)
group by company, date;
quit;

View solution in original post

9 REPLIES 9
FreelanceReinh
Jade | Level 19

Hi @Ruslan,

 

Many thanks for providing such precise specifications.

 

Here is a first solution using a data step with triple DOW loop:

data have;
input Company Date Var1 Var2;
cards;
1 1 10 0
1 1 15 1000
1 1 17 80
1 1 19 40
1 1 22 2000
1 1 25 100
1 2 9 10
1 2 15 200
1 2 16 100
1 2 20 500
1 2 22 900
1 2 23 400
1 2 26 250
1 3 11 100
1 3 13 80
1 3 18 0
1 3 20 0
1 3 22 0
1 3 24 0
1 3 26 420
1 3 28 5
2 1 20 300
2 1 22 200
2 1 25 0
2 2 18 2
2 2 21 350
2 2 25 40
2 2 27 70
;

data want;
do until(last.date);
  set have;
  by company date;
  s2=sum(s2,var2);
end;
do until(last.date);
  set have;
  by company date;
  s1=sum(s1,var2/s2*var1);
end;
do until(last.date);
  set have;
  by company date;
  s3=sum(s3,var2/s2*abs(var1-s1));
end;
Var3=s3/s1;
keep Company Date Var3;
run;
FreelanceReinh
Jade | Level 19

And here is another solution using PROC SQL:

 

proc sql;
create table want as
select company, date, sum(var2/s2*abs(var1-s1))/s1 as Var3
from (select *, 
      sum(var2/s2*var1) as s1
      from (select *, sum(var2) as s2
            from have
            group by company, date)
      group by company, date)
group by company, date;
quit;

If you need the aggregated Var3 values in each of the original observations, please replace select company, date by select * in the above PROC SQL step.

 

Edit: In the latter situation the sort order within the BY groups is likely to differ from the original sort order in dataset HAVE, as there is no key available. If this was an issue, you could create a sequential number in HAVE and use it as a sort key.

Ruslan
Calcite | Level 5

Thanks a lot for your solution. That works perfectly.

 

I need one more small thing:

 

As you can see, var2 can have zero values. I want to put a restriction such that there should be at least four var2 observations with non-zero values in order to estimate var3. Otherwise, I need to put missing value for var3 on a certain day.

 

Look forward to your reply!

FreelanceReinh
Jade | Level 19

You're welcome.

 

The modification you're asking for is no problem either. Can there be negative values of Var2 in your data? Or is "non-zero" equivalent to ">0" for Var2?

Ruslan
Calcite | Level 5

No, the values are either zero or positive. So I need to put restriction only on non-zero observations.

FreelanceReinh
Jade | Level 19

Thanks for your quick reply.

 

Here's the modified data step solution:

data want;
do until(last.date);
  set have;
  by company date;
  s2=sum(s2,var2);
end;
do until(last.date);
  set have;
  by company date;
  s1=sum(s1,var2/s2*var1);
end;
do until(last.date);
  set have;
  by company date;
  s3=sum(s3,var2/s2*abs(var1-s1));
  m2=sum(m2,(var2>0));
end;
Var3=s3/s1;
if m2<4 then var3=.;
keep Company Date Var3;
run;

For completeness, please find below the modified PROC SQL solution:

proc sql;
create table want as
select company, date, case when sum(var2>0)>=4 then sum(var2/s2*abs(var1-s1))/s1
                                               else .
                      end as Var3
from (select *, 
      sum(var2/s2*var1) as s1
      from (select *, sum(var2) as s2
            from have
            group by company, date)
      group by company, date)
group by company, date;
quit;
Ruslan
Calcite | Level 5

Thanks again for your prompt response. 

 

I tried your code and got a missing value for company 1, day 3. However, I have 4 non-zero values on this day, so I should calculate Var3. How can I change your code?

FreelanceReinh
Jade | Level 19

Are you sure? I obtain the following PROC PRINT output for dataset WANT with both the data step and the PROC SQL solution:

Obs    Company    Date      Var3

 1        1         1     0.15746
 2        1         2     0.10107
 3        1         3     0.27080
 4        2         1      .
 5        2         2     0.08629

Did you change dataset HAVE in the meantime?

Ruslan
Calcite | Level 5

Yes, sorry, I got confused. Wanted to tell you "feel free to ignore my stupid question", but you replied before that 🙂

 

Everything works perfectly. Many thanks for your prompt responses and solutions.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 2787 views
  • 0 likes
  • 2 in conversation