Hi,
My objective is to get NOBS=variable and the values of dim(char) and dim(numeric) from one dataset as use the same as array subscript in another datastep or datasteps(plural) or if possible even in the same datastep. Although, I understand the concept, i just don't know how to technically write a statement to get that to nobs and dim from let's say dataset1.
I'd appreciate if anyone can demonstrate an example.
Please and Thanks,
Charlotte
For use in many DATA steps, macro language would be appropriate:
data _null_;
set first_dataset nobs=_total_obs_;
array nums {*} _numeric_;
array chars {*} _character_;
call symputx('n_obs', _total_obs_);
call symputx('n_nums', dim(nums));
call symputx('n_chars', dim(chars));
stop;
run;
That gives you three macro variables to work with in later steps &N_OBS, &N_NUMS, and &N_CHARS.
You can get these values in the same DATA step, without macro language:
data want;
if 0 then do;
set first_datset nobs=_total_obs_;
array nums {*} _numeric_;
array chars {*} _character_;
end;
set second_dataset;
*** Add statements to process second_dataset;
run;
This gives you three DATA step numbers to work with: _total_obs_, dim(nums), and dim(chars). Note that the contents of the arrays depend only on the first data set, and the second SET statement does not impact the arrays.
Good luck.
Please describe a bit more about what you wish to achieve (and why).
Also, some sample input/output data, and some pseudo code might help us follow your thoughts.
Yes, perhaps more information. You can get obs from sashelp.vtable - variable nobs plus various other methods. You could also query sashelp.vcolumn with a count function to get number of array eg.
proc sql;
select count(1)
into :NUM_VARS
from SASHELP.VCOLUMN
where LIBNAME="WORK"
and MEMNAME="ABC"
and substr(NAME,1,3)="COL";
quit;
Will give you a number in the macro variable of how many columns would have COL prefix.
Let us take a small example.
Have the Data Set, CLASS, from SASHELP.CLASS.
data class;
set sashelp.class;
run;
We wish to count the Frequency of AGE. We assume that the Maximum Age of the CLASS is 16. (This assumption is not needed as we could find the Max and Min Age in the previous step).
We use an array K with a dimension of 16 as no AGE can exceed it in the data set.
We contrive a trivial way to use the number observations of the data set so that we read one observation at a time using I as a counter until the counter(I) does not exceed NUM which is the total number of observations of the Data Set. When I = NUM, the do-loop ends and the next do-loop,
do i = 1 to dim(k);
is carried out. The Subscript of the array K takes values from 1 to dim(k) which happens to be 1 to 16. In this do-loop, if the array-cell contains a value > 0, then it is output. The Output data set is shown below.
Hope this helps you to understand what you seek.
data want;
array k[16] _temporary_;
do i = 1 by 1 while ( i <= num);
set class nobs = num;
k[Age] + 1;
end;
do i = 1 to dim(k);
if k > 0 then do;
Age = i;
Freq = k;
output;
end;
end;
keep Age Freq;
run;
Obs Age Freq
1 11 2
2 12 5
3 13 3
4 14 4
5 15 4
6 16 1
For use in many DATA steps, macro language would be appropriate:
data _null_;
set first_dataset nobs=_total_obs_;
array nums {*} _numeric_;
array chars {*} _character_;
call symputx('n_obs', _total_obs_);
call symputx('n_nums', dim(nums));
call symputx('n_chars', dim(chars));
stop;
run;
That gives you three macro variables to work with in later steps &N_OBS, &N_NUMS, and &N_CHARS.
You can get these values in the same DATA step, without macro language:
data want;
if 0 then do;
set first_datset nobs=_total_obs_;
array nums {*} _numeric_;
array chars {*} _character_;
end;
set second_dataset;
*** Add statements to process second_dataset;
run;
This gives you three DATA step numbers to work with: _total_obs_, dim(nums), and dim(chars). Note that the contents of the arrays depend only on the first data set, and the second SET statement does not impact the arrays.
Good luck.
Beautiful and neat, Thank you Sir,
In your second example wouldn't the contents of the PDV include variable descriptors of both datasets with missing values to variables of dataset1?
If yes, dropping or keeping them be necessary later?
/*coz i thought if 0 then set would still fully compile the entire descriptive portions occupying more memory but doing it one pass is awesome too, may i ask what do you reckon would be most efficient?*/
Please accept my apologies if i am wrong and silly.
@hkuo @datsp and RW, Many thanks for the quick response. I really appreciate it
Regards,
Charlotte from England!
Yes, you're correct about the PDV. You could add this statement:
drop _numeric_ _character_;
Just be sure to add it before the second SET statement ... and test it of course. But the definition of _numeric_ and _character_ in a DATA step is supposed to mean all the numeric (or character) variables defined so far. So this should work.
This?
PROC SQL NOPRINT;
SELECT NOBS, NUM_CHARACTER, NUM_NUMERIC INTO:NOBS TRIMMED, :NUM_CHARACTER TRIMMED, :NUM_NUMERIC TRIMMED
FROM DICTIONARY.TABLES WHERE LIBNAME='SASHELP' AND MEMNAME='CLASS';
QUIT;
%PUT NOBS=&NOBS NUM_CHARACTER=&NUM_CHARACTER NUM_NUMERIC=&NUM_NUMERIC;
Haikuo
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.