Solved: Best practice for finding abnormal trends in data

chrisdunlap · Posted 05-19-2022 07:40 PM

Hello,

My apologies if I incorrectly formatted anything in this support request or if I missed a technical document that covers this topic, this is the first time I have worked with SAS so I am not very experienced and I may have not searched the correct terms to figure out my issue. Anyway, I am working on my dissertation looking at opinions on smoking in vehicles using secondary data and there are two questions with the following question path with counts in red:

Q1: Inside a car when other people are present, do you think that smoking should: (n=132811)

(1) Always be allowed (n=4770)

(2) Be allowed under some conditions (n=25805)

(3) Never be allowed (n=102236)

If they answer (3), they are skip logic'd past the next question. If they answer (1) or (2), they are given the follow up question:

Q2: If children are present inside the car, do you think smoking should: (n=31023)

(1) Always be allowed (n=1189)

(2) Be allowed under some conditions (n=4592)

(3) Never be allowed (n=25242)

We decided that for analysis of Q2, we should carry Q1's answer (3) "never be allowed" to be included with Q2's answer (3) "never be allowed" since if they were never allowed in the first scenario, they are technically also never allowed in the second. However, when I do this, the new count looks like this for Q2 with differences in green:

NEWQ2: If children are present inside the car, do you think smoking should: (n=133259)

(1) Always be allowed (n=1189)

(2) Be allowed under some conditions (n=4592)

(3) Never be allowed (n=127478)

I do not know where to start to figure out where the additional counts are coming from once I add in the new condition. The code I'm using for this should be pretty simple where PEK6h is Q1 and PEK6h2 is Q2 and those other two variables are for my recodes.

If PEK6h=1 Then ATSMCARO=1; /* Always allowed */

Else If PEK6h=2 Then ATSMCARO=2; /* Be allowed under some conditions */

Else If PEK6h=3 Then ATSMCARO=3; /* Never be allowed */

If PEK6h2=1 Then ATSMCARC=1; /* Always allowed */

Else If PEK6h2=2 Then ATSMCARC=2; /* Be allowed under some conditions */

Else If PEK6h2=3 or PEK6h=3 Then ATSMCARC=3; /* Never be allowed */

How would you all recommend I go about figuring out where those new green values are coming from since there are now technically 448 more counts now in Q2 than Q1 despite Q2 being given to only those who answered Q1 and didn't answer (3) on Q1 as well so presumably there should be less or equal in Q2, not more even when adding in those "nevers" from Q1? Thank you and my apologies if this is not the correct channel for this question.

Thanks!

Quentin · Posted 05-20-2022 07:24 AM

Suggest running:

proc freq data=have;
  tables Q1*Q2*NewQ2 /missing list;
run;

That should give you a nice table of counts, allowing you trace how each combination of Q1 and Q2 values maps to your derived NewQ2 value.

The Boston Area SAS Users Group (BASUG) is hosting our in person SAS Blowout on Oct 18!
This full-day event in Cambridge, Mass features four presenters from SAS, presenting on a range of SAS 9 programming topics. Pre-registration by Oct 15 is required.
Full details and registration info at https://www.basug.org/events.

View solution in original post

mkeintz · Posted 05-20-2022 12:23 AM

Before you included the Q1 "never be allowed" responses in the Q2 "never be allowed" responses, your Q2 total response had N=31,023.

But the Q2 question was asked only of the Q1 "always be allowed" (N=4,770) and Q1 "under some conditions" (N=25,805). Those are presumably the only respondents given the second question, but that only adds up to 30,575. So how is it that the Q2 sample exceeds that total?

Is it possible that some Q1 "never be allowed" respondents had already slipped into the Q2 sample? If so, any subsequent addition of all the Q1 "never be allowed" respondents to Q2 would generate double counting of some respondents.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Quentin · Posted 05-20-2022 07:24 AM

Suggest running:

proc freq data=have;
  tables Q1*Q2*NewQ2 /missing list;
run;

That should give you a nice table of counts, allowing you trace how each combination of Q1 and Q2 values maps to your derived NewQ2 value.

The Boston Area SAS Users Group (BASUG) is hosting our in person SAS Blowout on Oct 18!
This full-day event in Cambridge, Mass features four presenters from SAS, presenting on a range of SAS 9 programming topics. Pre-registration by Oct 15 is required.
Full details and registration info at https://www.basug.org/events.

chrisdunlap · Posted 05-23-2022 08:02 PM

This worked thank you! It turned out that when I looked at the remaining two options other than "never be allowed", more people actually answered it in the follow up about children compared to the original so that was what was throwing me off.

Best practice for finding abnormal trends in data

Re: Best practice for finding abnormal trends in data

Re: Best practice for finding abnormal trends in data

Re: Best practice for finding abnormal trends in data

Re: Best practice for finding abnormal trends in data

Best practice for finding abnormal trends in data

Re: Best practice for finding abnormal trends in data

Re: Best practice for finding abnormal trends in data

Re: Best practice for finding abnormal trends in data

Re: Best practice for finding abnormal trends in data

SAS Innovate 2025: Call for Content