BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
cassylovescats
Calcite | Level 5

Hello,

 

This is my current code:

 

libname Example "~/my_courses/Homework/FinalHomework";

data Sugar;
length DistrictGroup $ 30;
infile '~/my_courses/Homework/FinalHomework/CaneData2.csv/' dsd firstobs=2;
input District $ DistrictGroup $ DistrictPosition $ SoilID SoilName $ Area Variety $ Ratoon $ Age HarvestMonth HarvestDuration TonnHect Fibre Sugar Jul96 Aug96 Sep96 Oct96 Nov96 Dec96 Jan97 Feb97 Mar97 Apr97 May97 Jun97 Jul97 Aug97 Sep97 Oct97 Nov97 Dec97;
;
run;

data SugarLong; 
set Sugar;
array mon{*} _numeric_;
do _n_=9 to dim(mon);
Month=vname(mon[_n_]);
Count=mon[_n_];
output;
end;
drop Month DistrictGroup SoilID SoilName Area Variety Ratoon Age HarvestMonth HarvestDuration TonnHect Fibre Sugar Jul96 Aug96 Sep96 Oct96 Nov96 Dec96 Jan97 Feb97 Mar97 Apr97 May97 
Jun97 Jul97 Aug97 Sep97 Oct97 Nov97 Dec97;
run;

data SugarLongResult;
  set SugarLong;
  select;
    when (Count > 0) Result='Yes';
    otherwise Result='No';
    end;
   
Proc print data=SugarLongResult (obs=50);
var District DistrictPosition Result;
print;

proc sort data=SugarLongResult out=SugarLongResultDupe NODUPKEY;
  by District DistrictPosition Result;
run;

proc freq data=SugarLongResultDupe;
tables DistrictPosition* Result / out=SugarLongResultFinal;
run;

proc print data=SugarLongResultFinal;
run;

 

There are only 15 District options but the output is currently adding up to 20. The problem seems to be with S, W, and C, it looks like it's including the count of No in both No and Yes.

 

The correct numbers are below:

 

DistrictPositionResultCount
NYes2
NNo0
EYes2
ENo0
SYes0
SNo2
WYes4
WNo2
CYes2
CNo1

 

This is the current output which is almost correct:

Outputs.PNG

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Why are you making an array named MON that includes variables like:

SoilID Area Age HarvestMonth HarvestDuration TonnHect Fibre Sugar

It looks to me like the reason you are seeing 20 instead of 15 is because for some DISTRICTPOSITION you have some observations with YES and some with NO.  Is it possible that the same DISTRICTPOSITION value appears in more than one DISTRICT value?

 

Do you want to calculate the YES/NO rule so that the values are the same for all observations from the same district? Perhaps you want to SUM the COUNT variable over all of the months, or take the MAX over all of the months? If so then there was no need to transpose it at all.

 

View solution in original post

7 REPLIES 7
ballardw
Super User

You should really show the code with messages from the log when getting unexpected output.

 

 

cassylovescats
Calcite | Level 5

Apologies.

 

I believe this is where the error is. (Observations = 20)

 

LogOutput.PNG

 

ballardw
Super User

@cassylovescats wrote:

Apologies.

 

I believe this is where the error is. (Observations = 20)

 

LogOutput.PNG

 


Then you likely need to look very closely at your data before the Proc Sort step and afterwards.

 

Or share your full data set SugarLong.

 

Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.

PaigeMiller
Diamond | Level 26

To properly diagnose this, we would need to see Data set SugarLongResultDupe.

 

You show us some data, but it's not clear what data you are showing us.

--
Paige Miller
Tom
Super User Tom
Super User

Why are you making an array named MON that includes variables like:

SoilID Area Age HarvestMonth HarvestDuration TonnHect Fibre Sugar

It looks to me like the reason you are seeing 20 instead of 15 is because for some DISTRICTPOSITION you have some observations with YES and some with NO.  Is it possible that the same DISTRICTPOSITION value appears in more than one DISTRICT value?

 

Do you want to calculate the YES/NO rule so that the values are the same for all observations from the same district? Perhaps you want to SUM the COUNT variable over all of the months, or take the MAX over all of the months? If so then there was no need to transpose it at all.

 

cassylovescats
Calcite | Level 5

That is exactly the issue I am finding.

I needed to transpose because I need to run chi-square, I have the mon because I copied the code from another forum and it worked. I am awful at SAS and just need to do this for a final presentation.

I made these codes one by one until it got to this result, I am not really sure how to backtrack at this point.

art297
Opal | Level 21

Post the first file you used, namely '~/my_courses/Homework/FinalHomework/CaneData2.csv/'.

 

Art, CEO, AnalystFinder.com

 

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 2014 views
  • 0 likes
  • 5 in conversation