BookmarkSubscribeRSS Feed
mustafghan
Calcite | Level 5

Hi,

I have the following dataset:

date               Company

May05            McDonald

May05            McDonald

May05            McDonald

May05            Apple

May05            Apple

May05            Apple

May05            Apple

May05           Coca Cola

May05           .

May05           McDonald

June05          Coca-Cola

June05          Coca-Cola

June05          Coca-Cola

June05          Coca-Cola

June05          Coca-Cola

June05          Apple

June05          Apple

June05          Apple         

June05          Apple              

June05          McDonald


Now, there are three main things I want to do here

1: First of all. I want to create a variable that assigns a value of 1 for each of the observation in "Company" and then sums it all up by different dates. So for May05 my new count variable (Var1) will equal 9 (doesn't count the missing) and 10 for when date is June05 as follows


Date                var2

May05               9

Jun05               10

2: I want to create another variable that only outputs the Company observations that is mentioned more than twice. So my new variable (Var2) will equal the following:


Date                var2

May05          McDonald

May05          McDonald

May05          McDonald

May05          Apple

May05          Apple

May05          Apple

May05          Apple

Jun05          Coca-Cola

Jun05          Coca-Cola

Jun05          Coca-Cola

Jun05          Coca-Cola

Jun05          Coca-Cola

Jun05          Apple

Jun05          Apple         

Jun05          Apple

Jun05          Apple


3: Finally, I want to do another step similar to number 2, except here, I would want Var3 to output Company observations that is mentioned more than twice, but that also should appera on both dates. So in my case, the observation that appears more than twice on both dates is Apple: The data should look as follows:


Date                var3

May05          Apple

May05          Apple         

May05          Apple

May05          Apple

Jun05          Apple

Jun05          Apple         

Jun05          Apple

Jun05          Apple


Please note that the actual dataset I have includes hundred if not thousands of different Company names.

7 REPLIES 7
Haikuo
Onyx | Level 15

What is your thought? What have you tried? you can't just throw this homework at us Smiley Happy

mustafghan
Calcite | Level 5

Honestly, I have thought a lot about it. I saw some things I could use through proc sql, but I am not that familiar with it and don't want to screw it up. I also think some sort of do loop would work here, but I just don't have enough sas expertise to actually try anything. I am sorry for looking and sounding like someone who is just try to push the work onto others, but I just really have no clue. And no this is not homework..it's for a research paper I want to write.

Astounding
PROC Star

If you don't have enough SAS expertise, you can never be confident of the answers you receive here.  Even if they seem to work, there are many posted answers that work for one batch of data that would not work for another batch of data.  You will have to develop some SAS expertise along the way and take responsibility yourself for having a working program.

That being said, here's a piece of what you asked about for question 1.  It won't get you all the way there, but it will push you in the right direction:

proc freq data=have;

  tables date;

  where company > ' ';

run;

See if you can embellish that to get closer.  In particular, see if you can figure out how PROC FREQ can create an output data set.

Good luck.

mustafghan
Calcite | Level 5

Thanks for your help. After some more snooping around, I found out that I can use proc means to do part two and possibly part three. But I am trying to figure out if there is a way i can use conditional statements to do that without resorting to doing another data step after I have done the count using proc means. Let me clarify a little bit more: I use the following code:


PROC MEANS DATA=count N ;

  CLASS Companyr ;

  VAR date;

RUN;

Here I can then use an out statement to output the result in a seperate dataset. From there, then I could just do a data step to include companies that are only mentioned more than twice (see below) However, is there a way to do that step within proc means without having to go through another data step?

data _null_;

set NewCount;

if N le 2 then delete;

run;

Can I somehow do the above data step within the proc means step?




Astounding
PROC Star

You can subset within PROC MEANS, using WHERE on the output data set:

proc means data=count nway;

class company;

var date;

output out=want (keep=company n where=(n > 2)) n=n;

run;

When you add a CLASS statement, you will need to know how NWAY works as well.

Good luck.

Paari
Calcite | Level 5

Please let us know all the approaches that you have tried because the problem seems to be very basic and it does not require much of SAS expertise. Then only we can help you. Smiley Happy

mustafghan
Calcite | Level 5

I tried using proc SQL for the first part: see below: But I don't want to delve into SQL world as I am not familiar with it at all.
proc sql;
  select count(Company) as Count
  from datasetnew

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1210 views
  • 3 likes
  • 4 in conversation