I have a dataset that produces the following responses from a farmer Survey via the SAS dataset variable Yield_Constraint or Yield_Impediment
There are three possible responses to Yield Constraint (1) Access to Credit (2) Equipment Leasing (3) Improve Security/Stability. The order of the responses in the frequency distribution above indicates the most challenging cause of low yield from left to right.
As one of the ways to analyze this variable, I wish to simply count for each of the three causes of low yield, how many farmers provide a particular response, regardless of ranking. So for example, how many farmers respond "Access to Credit" regardless of order?
I have used an external file to do this, using the code below. Please the questions are:
(a) Is there an easier way to do achieve the result below?
(b) How can I use a SAS dataset instead to achieve the same result instead of an external file?
Thanks for any help
Method Used:
===========
I simply determine how many responses each farmer provides and then output all responses to an external file - giving more rows than I have responding farmers. Use output dataset from PROC UNIVARIATE to determine the denominator i.e total farmers responding. Then use a PRO FREQ to determine the raw counts for each response from the external file. Finally, use the Counts from the PROC FREQ and the Total Farmers from the PROC UNIVARIATE to determine the correct percentage of each response. PROC PRINT produces the final output (please see below)
Data _null_;
file "C:\..........\Yield_Constraints.txt";
data FarmerAnalysis;
set INDATA ;
Farmers=1;
/*DETERMINE NUMBER OF CHALLENGES EACH FARMER HAS */
numYieldConstraint=countw(Yield_Constraint,,'f');
if numYieldConstraint gt 0 then
do;
do i = 1 to numYieldConstraint;
outputvalue=scan(Yield_Constraint,i,' ');
file "C:\............\Yield_Constraints.txt" mod;
put outputvalue;
end;
end;
proc univariate noprint data=FarmerAnalysisResearch;
/* use for weighted counts */
var Farmers;
output out=count sum=total;
run;
data indata_challenges;
infile "C:\.........\Yield_Constraints.txt" truncover;
input
@0001 Yield_Constraint $30.
;
proc freq data=indata_Challenges; tables Yield_Constraint/missing noprint out=YieldConstraint;
data YieldConstraint;
format FARMERS TOTAL comma20.;
set YieldConstraint (rename=(count=FARMERS));
if _n_ = 1 then set count; /*Append total number of Farmers from Proc Univariate*/.
Percent=round(FARMERS/Total*100,0.1);
if Yield_Constraint eq "" then delete; /*remove missing*/
proc print data=YieldConstraint (rename=(total=TOTAL_FARMERS));
var Yield_Constraint FARMERS TOTAL_FARMERS PERCENT;
The result:
Sure , you can do basically the same thing not using an external file. Your approach seems reasonable to me.
if numYieldConstraint gt 0 then
do;
do i = 1 to numYieldConstraint;
outputvalue=scan(Yield_Constraint,i,' ');
output;
end;
end;
keep outputvalue (... other things ...);
Using the `output` statement you can output multiple rows. You might want to output to a different dataset if you also want a one-row-per dataset (the FarmerAnalysis dataset you have now).
Sure , you can do basically the same thing not using an external file. Your approach seems reasonable to me.
if numYieldConstraint gt 0 then
do;
do i = 1 to numYieldConstraint;
outputvalue=scan(Yield_Constraint,i,' ');
output;
end;
end;
keep outputvalue (... other things ...);
Using the `output` statement you can output multiple rows. You might want to output to a different dataset if you also want a one-row-per dataset (the FarmerAnalysis dataset you have now).
Thanks so much snoopy369.
The problem I am having using datasets, which is why I went with the inefficient external file is that I only want to output the variable "outputvalue" in the "utility" data set "outputchallenges". So I tried using the code below, which does what I want:
However the "Keep" Statement affects the dataset "FarmerAnalysis" and I need to keep a much larger subset of variables for other computations. Please is it possible to specify which variables are kept for each dataset? or do I simply need two passes of the inputdataset to achieve what I want. Thanks
data FarmerAnalysis outputchallenges;
set Inputdataset;
Farmers=1;
numYieldConstraint=countw(Yield_Constraint,,'f');
if numYieldConstraint gt 0 then
do;
do i = 1 to numYieldConstraint;
outputvalue=scan(Yield_Constraint,i,' ');
keep outputvalue ;
output outputchallenges;
end;
end;
......
run;
The sas dataset outputchallenges will contain the single variable outputvalue.
Thanks again Snoopy 369
I figured out how to do the individual "keep" statements in a single datastep:
data FarmerAnalysisResearch (keep=Farmers Yield_Constraint) outputYieldChallenges (keep=Yield_Constraints);
With strategically placed output statements for each dataset, it works like a dream, so much more efficient than using an external file.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.