I have a SAS data set I read in:
libname mysas "path to datafile location"; data mysas.MyData; set mysas.datafile; run;
There is a column named "SPECIES NAME" that has many different types of species. I want to get a count of how many species there are, and here is what I did:
proc sql; select count(distinct 'SPECIES NAME') as Number_of_Species from mysas.MyData; quit;
However, it tells me there is only 1 Number_of_Species. Looking at the data set, this is definitely not accurate. How can I fix this?
Because the 2nd (and 3rd I guess) thing I want to do is for each of the species, I want to get a count on how many years have been the data recorded for each species. Conversely, I also want to create an ordered list of all the years, then for each year, give the number of species that corresponds to that year.
There is a column in MyData called "YEAR" next to "SPECIES NAME" and it looks a bit like this:
YEAR SPECIES NAME
2021 SPECIES A
2022 SPECIES A
2023 SPECIES A
2020 SPECIES B
2022 SPECIES B
I am not quite sure how I would approach that in proc sql.
Using liternal name 'SPECIES NAME'n to represent this variable, otherwise sas would take 'SPECIES NAME' (without 'n') as a string ,therefore you would get ONLY ONE level.
option validvarname=any; proc sql; select count(distinct 'SPECIES NAME'n ) as Number_of_Species from mysas.MyData; quit;
I tried this, but it looks like I am getting an error: ERROR: The following columns were not found in the contributing tables: "SPECIES NAME"n
However, when I remove the n, it runs again giving me the output of 1. I also added
option validvarname=any;
before the proc sql.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.