Hello,
I need to write a count macro which is calculate distinct value by variable and populate the table. For example, in test1 dataset, I do have total 5 observations but out of 5, 4 are unique. If I run the macro, I need to have final table listing as below.
data test1;
input var1 var2 $;
datalines;
001
001
002
003
004
;
run;
data test2;
input var1 var2 $;
datalines;
001
001
002
002
004
;
run;
First, make sure that any data step you provide for example data runs correctly. Both of your examples throw errors. Are you sure there should be an input for Var2 at all? Note that the number of observations is not matching your claim of "For example, in test1 dataset, I do have total 5 observations but out of 5, 4 are unique" but only 2 are in the output because of the error.
56 data test1; 57 input var1 var2 $; 58 datalines; NOTE: LOST CARD. RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+-- 64 ; var1=4 var2= _ERROR_=1 _N_=3 NOTE: SAS went to a new line when INPUT statement reached past the end of a line. NOTE: The data set WORK.TEST1 has 2 observations and 2 variables. NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.01 seconds 64 ; 65 run; 66 67 data test2; 68 input var1 var2 $; 69 datalines; NOTE: LOST CARD. RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+-- 75 ; var1=4 var2= _ERROR_=1 _N_=3 NOTE: SAS went to a new line when INPUT statement reached past the end of a line. NOTE: The data set WORK.TEST2 has 2 observations and 2 variables. NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.01 seconds 75 ; 76 run;
Consider this code:
data test1; input var1; datalines; 001 001 002 003 004 ; run; data test2; input var1; datalines; 001 001 002 002 004 ; run; data helpful; set test1 test2 indsname=source; table_name=source; run; proc freq data=helpful nlevels; ods output nlevels=want; by notsorted table_name; tables var1; run;
This reads the data in a manner that seems to match your description.
The data step combines the data and adds a variable to the data that has the original dataset name. If you don't like the library part that is an exercise for the interested reader to remove that from the Table_name variable. The Proc Freq has an option, Nlevels that actually counts the unique levels of each variable. The BY means that there is an output record in the set Want for each table_name value. Renaming the NLevels variable to Count and dropping the other variables from the Want set are also nearly trivial exercises.
Note that if your data sets actually have more than one variable you add them to the Tables statement and you will have a count for each.
Note, no macro code needed at all.
No macro needed, and probably very inefficient if you did write one.
Combine all your data sets into one long data set. Then run PROC FREQ.
data test1;
input var1 $;
datalines;
001
001
002
003
004
;
data test2;
input var1 $;
datalines;
001
001
002
002
004
;
data combined;
set test1 test2 indsname=indsname;
table=indsname;
run;
proc freq data=combined nlevels;
ods output nlevels=nlevels;
by table notsorted;
tables var1/noprint;
run;
The data set NLEVELS is what you want.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.