dear community,
I have a quesiton if you would like to select only specific parks (ACAD and ARCH) and count them, and the other parks not. What is wrong in this code below? pls advise.
regards,
Karolina
As Paige has explained first. or last. on a by statement returns a 0 or 1 indicating true or false. If your objective is to count the records as you are reading the data set you can use the code below. However if your objective is to get counts of each by group then you can use any of the procs that SAS has.
data new_ds;
set species_sort end=last;
by parkcode;
retain count;
if first.parkcode then do; count=0; end;
count+1;
if last.parkcode then do;
put parkcode= count=;
if last then do;
put _n_=;
end;
run;
if first.parkcode in ("ACAD","ARCH") then ...
will not work, because the special variables that begin with first. are always either 0 or 1. They are zero when it is not the first observation of a BY variable, and it is 1 if it is the first obsesrvation of a BY variable. So whether it is 0 or 1, it will never be in ("ACAD","ARCH")
What you really want to do is this:
proc freq data=pg2.np_species (where=(parkcode in ('ACAD','ARCH')));
tables parkcode;
run;
Advice: don't try to write your own DATA step code for common activities like counting, computing sums and means and standard deviations, finding minimums and maximums, and so forth. SAS has written many PROCs where they have already programmed these things, then they debug the code, verify that it works, and these PROCs have been proven in a bazillion and 14 real world applications. If you write your own DATA step code, then you have to vouch for the correctness of the results, and as you can see, you also have to overcome the difficulties of programming these things in a DATA step.
From now on, please do not post your code as a screen capture, post your code as text, in the code box as I have done, by clicking on the "little running man" icon. Also, please, from now on, when your code doesn't work, show us the entire LOG for the step that isn't working (not selected parts of the log) as text, by clicking on the </> icon and pasting your log text into that window.
As Paige has explained first. or last. on a by statement returns a 0 or 1 indicating true or false. If your objective is to count the records as you are reading the data set you can use the code below. However if your objective is to get counts of each by group then you can use any of the procs that SAS has.
data new_ds;
set species_sort end=last;
by parkcode;
retain count;
if first.parkcode then do; count=0; end;
count+1;
if last.parkcode then do;
put parkcode= count=;
if last then do;
put _n_=;
end;
run;
Thank you all for your advices. I have now code that works, it counts the occurences for parks ACAD and ARCH and for other parks the count is zero.
data count;
set species_sort;
by parkcode;
if first.parkcode and parkcode in ('ACAD', 'ARCH') then
count=0;
count+1;
if parkcode not in ('ACAD', 'ARCH') then count=0;
if last.parkcode then output;
run;
@touwen_k again I recommend you use PROC FREQ here instead of struggling with DATA step code.
Karolina,
You don't need to sort the data or count the parks you are looking for in a data step to get totals for (ACAD and ARCH) when you use reporting procedures and include a where clause you have all the info. PROC FREQ maybe easier as shown in PaigeMiller reply.
I like to use PROC TABULATE. What I did was to make a small data set with park names and ran the tabulate:
PROC TABULATE DATA=parks (WHERE=(park IN("ACAD" "ARCH"))) NOSEPS MISSING; CLASS park; TABLE PARK ALL , N /RTS=20 ROW=FLOAT MISSTEXT=' '; RUN;
OUTPUT:
I am a Tabulate and FORMAT geek so I also created the following code that includes all parks; results for ACAD, ARCH, and all other parks combined:
PROC FORMAT;
VALUE $PARK (MULTILABEL NOTSORTED)
'ARCH', 'ACAD' = 'TOTAL for ARCH and ACAD'
'ARCH' = 'ARCH'
'ACAD' = 'ACAD'
OTHER = 'OTHER PARKS'
;
PICTURE pct(round)
low -<0 = '009.00%' (PREFIX='-')
0 -high = '009.00%' (PREFIX=' ')
;
RUN;
PROC TABULATE DATA=parks NOSEPS MISSING;
CLASS park/MLF ORDER=DATA;
FORMAT park $PARK.;
TABLE PARK ALL
,
N
PCTN = '%'*F=pct.
/RTS=30 ROW=FLOAT MISSTEXT=' ';
RUN;
OUTPUT:
Full code should be attached.
Jonas V. Bilenas
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.