<P>Hi! I am doing a Chi-Square test to determine if the deaths are equal between two populations by age group. However, I get the warning below, which I'm not surprised about because some of the age groups had 0 deaths or >5. How do I proceed? </P><P> </P><P> </P><P>Title "Chi-Square of Deaths by Age";<BR />PROC FREQ data=AgeDeath;<BR />TABLE age*wave / CHISQ EXPECTED DEVIATION NOROW NOCOL NOPERCENT;<BR />format age agegrp.;<BR />RUN;</P>
<P>Hi Steve,</P><P> Thank you. Do you have any references or suggestions on how I code it based on the cutoff date?</P>
Data set 1 would not have identical observations within the dataset. I need to identify obervations that are 100% identical between data sets. For example, I need to know if there is an oberservation in data set 1 that is also in data set 2 or vice versa.
<P>I have two data sets with the same variables, but different observations. I need to know if any observations in data set 1 are in data set 2. How do I do this? Do I merge them first?</P>
Now we are getting somewhere lol. Since my oberservation IDs are identified by numbers, it would be difficult to try to create a dummy variable for them since they follow no specific order, rhyme, or reason. However, what separates my two populations is a date range. My next question is how do you create an indicator variable using a date range? For example, group 1 is defined from Nov 1999-Jan 2000, and group two is defined as Feb 2000-April 2000. How would that look coded?
<P>Hi Rick,</P><P> I think I understand what you are saying, but when I look at that document, it shows two Chi-Square test,( one for each region). I am only wanting to do one Chi-Square test, based off two different groups. I have attached a document that shows the frequency of sex, Females and Males,in Case group, as well as the frequency of sex3, Females and Males, in Case3 group. Since they are in two different data sets, how would I code this? </P><P> </P><P>*I will exclude the Unknown values before analysis, but I am more interested in how to set up the code right now</P>
I am needing to do a Chi Square test of Homogeneity. I know how to do the Chi Square test of Independence, my issue is I have 2 datasets. One dataset for one population and another dataset (same variables) for the second population. Do I need to have these in the same dataset to do the test?
<P>I had a mistake on my end. I did not know that a merge was case-sensitive. They have merged, however, I am getting this error:</P><P>WARNING: Multiple lengths were specified for the BY variable County by input data sets. This might<BR />cause unexpected results.<BR />NOTE: There were 159 observations read from the data set WORK.ONE.<BR />NOTE: There were 778604 observations read from the data set WORK.TWO.<BR />NOTE: The data set WORK.COMBINED has 778604 observations and 17 variables.<BR />NOTE: DATA statement used (Total process time):<BR />real time 0.49 seconds<BR />cpu time 0.46 seconds</P><P> </P><P>When I try to switch them around, the Warning goes away, but it adds additional observations to my data set. (See below)</P><P>NOTE: There were 778604 observations read from the data set WORK.TWO.<BR />NOTE: There were 159 observations read from the data set WORK.ONE.<BR />NOTE: The data set WORK.COMBINED has 778618 observations and 17 variables.<BR />NOTE: DATA statement used (Total process time):<BR />real time 0.48 seconds<BR />cpu time 0.46 seconds</P>
<P>Here are some observations from my two data set (separated by the yellow column). The data set on the Left has a FIPS code & 2013 code assigned to each County. The data set on the right has multiple observations, and what I need to do is assign the FIPS code and 2013 code to each county on the right. For example, all observations that have Bartow county will also have a FIPS code of 13015 and a 2013 code as 2.</P>
<P>I tried to code you suggested and got this: </P><P>NOTE: The data set WORK.SORTED has 0 observations and 4 variables.<BR />NOTE: DATA statement used (Total process time):<BR />real time 0.03 seconds<BR />cpu time 0.01 seconds</P><P> </P><P>I should note that there are not multiple states, there is only one State. One data set has multiple variables that include the state and county, and the other data set also has the state and county. I need to match the FIPS Code and the 2013 Code to each county listed in my first data set. Does this make more sense?</P>
<P>When I merge them, some are blank, and I know that to be incorrect because the blanks one have a county listed.</P><P>proc sort data=Z out=One;<BR />by County;<BR />run;<BR />proc sort data=X out=Two;<BR />by County;<BR />run;<BR />data Combined;<BR />merge One (in=a) Two (in=b);<BR />by County;<BR />run;</P>
<P>I have two excel files I need to merge. I need to add the FIPS Code and 2013 Code to my data set. The variable they have in common is County and State. However, they each have a different number of observations, so when I try to merge them, it is distorted. I need to add the FIPS Code and the 2013 Code to each County in my dataset, how do I do this?</P>
<P>When I do them separately, it runs correctly, it's just when I put them together in the same statement I get the error.</P>
<P>I'm trying to only keep certain observations in my data set, but when I do the IF statements, it is reporting no observations. Basically, I only want to keep observations that are considered confirmatory or presumptive, but I get the SAS error below. Why is it converting the column to numeric?</P><P> </P><P>data All;<BR />set xx;<BR />If Highest_Evidence eq 'Confirmatory' or 'Presumptive';<BR />run;</P>
<P>Hi!</P><P>I had an issue with SAS converting my numeric variables to character variables when imported from excel. To go around that, I saved it as a CSV then imported it. Now, I am doing a proc frequency on a character variable, and SAS is duplicating those variables. See pic below! On my spread sheet, I only have Female, Male, and Unknown, however, why are they being duplicated?</P><P>P.S How can I get UNKNOW to display the full name (UNKNOWN)?</P>