Hi Everyone,
I am a new student, new to SAS, and new to SAS community. I am hoping that you guys will be able to help me with my data problem. I have companies information that I need help with. Here is a sample of the data:
CoID | Year | Profit | R&DEx |
---|---|---|---|
123 | 1998 | -100 | 15 |
123 | 1999 | -100 | 10 |
123 | 2000 | 150 | 15 |
123 | 2001 | 200 | 20 |
456 | 1999 | 50 | 10 |
456 | 2000 | 10 | . |
456 | 2001 | -25 | 15 |
789 | 1999 | 150 | . |
789 | 2000 | -100 | . |
789 | 2001 | 50 | . |
789 | 2002 | 100 | . |
In this table, the first variable "CoID" is company id which is a character in SAS and I want to keep it that way. The second column is the year, which is in number format. The third column is the profit (which can be negative) and the fourth column is Research and Development Expense.
What I want to do with the data is following things:
1) Create a variable called "R&DExAvg" which is the average of the R&D Expense for each company.
2) Create a variable called "R&DMwAvg" which will be filled the missing value of R&D Expense with the average of that variable with respect to that company. If every R&D Expense value is missing, then keep that variable missing as well. If R&D Expense value is filled, then keep that variable same as R&DEx.
3) Create a variable called "Lossin2K" which will be filled with 1 for every value of that company if the profit was below zero during the year 2000 otherwise it will be 0.
4) Create a variable called "Lossporin2K2" which will be filled with 1 for every value of that company if the profit was below zero two years in a row or more during the year 2000 or before otherwise it will be 0.
This is what I need to have after the above mentioned programming done:
CoID | Year | Profit | R&DEx | R&DExAvg | R&DMwAvg | Lossin2K | Lossporin2K2 |
---|---|---|---|---|---|---|---|
123 | 1998 | -100 | 15 | 15 | 15 | 0 | 1 |
123 | 1999 | -100 | 10 | 15 | 10 | 0 | 1 |
123 | 2000 | 150 | 15 | 15 | 15 | 0 | 1 |
123 | 2001 | 200 | 20 | 15 | 20 | 0 | 1 |
456 | 1999 | 50 | 10 | 12.5 | 10 | 0 | 0 |
456 | 2000 | 10 | . | 12.5 | 12.5 | 0 | 0 |
456 | 2001 | -25 | 15 | 12.5 | 15 | 0 | 0 |
789 | 1999 | 150 | . | . | . | 1 | 0 |
789 | 2000 | -100 | . | . | . | 1 | 0 |
789 | 2001 | 50 | . | . | . | 1 | 0 |
789 | 2002 | 100 | . | . | . | 1 | 0 |
I know it is lot to ask but please help me out. I would appreciate if you can help me out with separate data statement for each variable and keeping it simple (please no macro!).
Once again thanks,
The Rock
"if the profit was below zero two years in a row or more during the year 2000 "
You mean year less than and equal 2000 ?
data have;
length coid $4 year profit rndex 8;
input coid year profit rndex;
datalines;
123 1998 -100 15
123 1999 -100 10
123 2000 150 15
123 2001 200 20
456 1999 50 10
456 2000 10 .
456 2001 -25 15
789 1999 150 .
789 2000 -100 .
789 2001 50 .
789 2002 100 .
;
run;
proc sql;
create table want as
select *,mean(rndex) as rndex_avg,
coalesce(rndex,calculated rndex_avg) as rndex_Mwavg,
case
when (select profit from have where coid=a.coid and year=2000) lt 0 then 1
else 0
end as Lossin2K,
case
when (select count(*) from have where coid=a.coid and year le 2000 and profit lt 0) gt 1 then 1
else 0
end as Lossporin2K2
from have as a
group by coid ;
quit;
Hi,
This is not the whole answer but it will get you started.
data co1;
length coid $4 year profit rndex 8;
input coid year profit rndex;
datalines;
123 1998 -100 15
123
1999 -100 10
123 2000 150 15
123 2001 200 20
456 1999 50 10
456 2000 10 .
456 2001 -25 15
789 1999 150 .
789 2000 -100 .
789 2001 50 .
789 2002 100 .
;
run;
proc sql;
create table co2 as
select *, mean(rndex) as rndexavg
from co1
group by coid;
quit;
You can't use an ampersand in a SAS variable name.
"if the profit was below zero two years in a row or more during the year 2000 "
You mean year less than and equal 2000 ?
data have;
length coid $4 year profit rndex 8;
input coid year profit rndex;
datalines;
123 1998 -100 15
123 1999 -100 10
123 2000 150 15
123 2001 200 20
456 1999 50 10
456 2000 10 .
456 2001 -25 15
789 1999 150 .
789 2000 -100 .
789 2001 50 .
789 2002 100 .
;
run;
proc sql;
create table want as
select *,mean(rndex) as rndex_avg,
coalesce(rndex,calculated rndex_avg) as rndex_Mwavg,
case
when (select profit from have where coid=a.coid and year=2000) lt 0 then 1
else 0
end as Lossin2K,
case
when (select count(*) from have where coid=a.coid and year le 2000 and profit lt 0) gt 1 then 1
else 0
end as Lossporin2K2
from have as a
group by coid ;
quit;
Yes I mean less than or equal to.
Thank you so much for the help. It worked perfectly.
I have never used proc SQL before and I need to learn about it. Do you have any guidance on where to start?
Once again, Thank you!!!
SAS documentation about SQL is a good start . Search it at support.sas.com
Here's a link to the SQL doc: SAS(R) 9.4 SQL Procedure User's Guide, Second Edition
Thanks to both Xia Keshan and EM@sas on info about the SAS SQL documentation.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.