Hi,
I have the following dataset:
date Company
May05 McDonald
May05 McDonald
May05 McDonald
May05 Apple
May05 Apple
May05 Apple
May05 Apple
May05 Coca Cola
May05 .
May05 McDonald
June05 Coca-Cola
June05 Coca-Cola
June05 Coca-Cola
June05 Coca-Cola
June05 Coca-Cola
June05 Apple
June05 Apple
June05 Apple
June05 Apple
June05 McDonald
Now, there are three main things I want to do here
1: First of all. I want to create a variable that assigns a value of 1 for each of the observation in "Company" and then sums it all up by different dates. So for May05 my new count variable (Var1) will equal 9 (doesn't count the missing) and 10 for when date is June05 as follows
Date var2
May05 9
Jun05 10
2: I want to create another variable that only outputs the Company observations that is mentioned more than twice. So my new variable (Var2) will equal the following:
Date var2
May05 McDonald
May05 McDonald
May05 McDonald
May05 Apple
May05 Apple
May05 Apple
May05 Apple
Jun05 Coca-Cola
Jun05 Coca-Cola
Jun05 Coca-Cola
Jun05 Coca-Cola
Jun05 Coca-Cola
Jun05 Apple
Jun05 Apple
Jun05 Apple
Jun05 Apple
3: Finally, I want to do another step similar to number 2, except here, I would want Var3 to output Company observations that is mentioned more than twice, but that also should appera on both dates. So in my case, the observation that appears more than twice on both dates is Apple: The data should look as follows:
Date var3
May05 Apple
May05 Apple
May05 Apple
May05 Apple
Jun05 Apple
Jun05 Apple
Jun05 Apple
Jun05 Apple
Please note that the actual dataset I have includes hundred if not thousands of different Company names.
What is your thought? What have you tried? you can't just throw this homework at us
Honestly, I have thought a lot about it. I saw some things I could use through proc sql, but I am not that familiar with it and don't want to screw it up. I also think some sort of do loop would work here, but I just don't have enough sas expertise to actually try anything. I am sorry for looking and sounding like someone who is just try to push the work onto others, but I just really have no clue. And no this is not homework..it's for a research paper I want to write.
If you don't have enough SAS expertise, you can never be confident of the answers you receive here. Even if they seem to work, there are many posted answers that work for one batch of data that would not work for another batch of data. You will have to develop some SAS expertise along the way and take responsibility yourself for having a working program.
That being said, here's a piece of what you asked about for question 1. It won't get you all the way there, but it will push you in the right direction:
proc freq data=have;
tables date;
where company > ' ';
run;
See if you can embellish that to get closer. In particular, see if you can figure out how PROC FREQ can create an output data set.
Good luck.
Thanks for your help. After some more snooping around, I found out that I can use proc means to do part two and possibly part three. But I am trying to figure out if there is a way i can use conditional statements to do that without resorting to doing another data step after I have done the count using proc means. Let me clarify a little bit more: I use the following code:
PROC MEANS DATA=count N ;
CLASS Companyr ;
VAR date;
RUN;
Here I can then use an out statement to output the result in a seperate dataset. From there, then I could just do a data step to include companies that are only mentioned more than twice (see below) However, is there a way to do that step within proc means without having to go through another data step?
data _null_;
set NewCount;
if N le 2 then delete;
run;
Can I somehow do the above data step within the proc means step?
You can subset within PROC MEANS, using WHERE on the output data set:
proc means data=count nway;
class company;
var date;
output out=want (keep=company n where=(n > 2)) n=n;
run;
When you add a CLASS statement, you will need to know how NWAY works as well.
Good luck.
Please let us know all the approaches that you have tried because the problem seems to be very basic and it does not require much of SAS expertise. Then only we can help you.
I tried using proc SQL for the first part: see below: But I don't want to delve into SQL world as I am not familiar with it at all.
proc sql;
select count(Company) as Count
from datasetnew
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.