Introduction
Health information avoidance—defined as any behavior designed to prevent or delay access to available but potentially unwanted health information (Howell et al. (2020); Sweeny et al. (2010))—remains a significant barrier to realizing the public health benefits of personalized risk communication. Despite the rapid expansion of access to medical information in the Information Age, many people still choose not to learn about their personal health risks (Gigerenzer & Garcia-Retamero (2017); Ho et al. (2021); Kelly & Sharot (2021)). Health information avoidance is not a uniform behavior, but can take multiple forms, varying in its duration and degree of intentionality. Individuals may avoid health information by delaying the decision to learn about their screening results or they may avoid it completely, choosing to never know. Information avoidance may also manifest through both active and passive means. People may explicitly ask others not to disclose information, physically remove themselves from situations were information might be revealed, or passively refrain from questions that could reveal unwanted knowledge. These avoidance behaviors may manifest as refusing cancer screening, delaying medical care, or not requesting test results (Sweeny et al. (2010)).
Research indicates that health information avoidance is common. For example, approximately 15% of US adults avoid personalized health risk information across various contexts (Meese et al., 2022), and nearly 39% express a reluctance to learn about their cancer risk (Emanuel et al. (2015)). Similar avoidance rates (e.g., approximately 40%) have been observed for non-personalized health information (i.e., general health information not relevant to personal risk) (Chae et al. (2019); Orom et al. (2020)).
Such widespread avoidance underscores the need to understand how and why individuals avoid health information. Some individuals may avoid health information to protect themselves from unpleasant emotions, prevent exposure to information that conflicts with their worldview or creates an obligation to act. Even when the information may be critical to health,avoidance eliminates the discomfort of decision-making and the emotional burden of confronting potential illness (Sweeny et al. (2010)). Previous research has identified various psychological and cognitive factors underlying this avoidance phenomenon and explored potential explanations for this behavior. O’Brien et al. (2024) found that self-perceptions of health, such as low perceived risk, engagement in healthy behaviors, and demographic characteristics, often guide people’s decisions to avoid learning about their health risks. Other studies have shown that information overload can increase anxiety and cognitive dissonance, leading to avoidance behavior (Dattilo et al. (2022); Song et al. (2021); Soroya & Faiola (2023)). Furthermore, heightened risk perceptions can exacerbate anxiety and sadness, which may further hinder people from seeking health information (Sultana et al. (2023); Zhao & Cai (2009)).
However, few studies have attempted to use machine learning methods to identify predictors of health information avoidance. This study applies predictive modeling and machine learning methods to examine patterns of cancer screening avoidance (“health avoiders”) using sociodemographic and psychological data. By integrating behavioral and belief factors, this study aimed to better understand the complex dynamics that lead to health information avoidance. The key finding is that while many predictors showed statistical significance, none provided meaningful predictive power. This suggests that the key determinants leading people to avoid health information remain unidentified.
Review of Machine Learning Methods
Machine learning (ML) methods were applied to identify patterns associated with health information avoidance. This study focuses on three main models: Logistic Regression, Random Forest, and Multivariate Adaptive Regression Splines (MARS). Each approach offers different advantages and trade-offs, summarized in Table 1 below.
Table 1. Comparison of Machine Learning Models
| Model | Description | Advantages | Limitations | Interpretability |
|---|---|---|---|---|
| Linear Regression | Models the relationship between predictors and a continuous outcome using a straight-line equation. | Simple, interpretable, and efficient to train; effective for linearly related data; allows understanding of variable relationships. | Assumes linearity and independence among variables; sensitive to outliers; limited for complex or nonlinear data. | High |
| Logistic Regression | Estimates the probability of a binary outcome based on a linear combination of predictors. | Simple, interpretable, good baseline; easy to assess predictor importance. | Assumes linearity; struggles with nonlinear or high-dimensional data. | High |
| Random Forest | Ensemble of decision trees using bootstrapped samples and random feature selection. | Handles nonlinear and complex data; reduces overfitting; provides feature importance. | Less interpretable; slower with large datasets; harder to explain model logic. | Moderate |
| MARS (Multivariate Adaptive Regression Splines) | Builds flexible regression models using piecewise linear splines. | Captures nonlinear relationships; performs automatic feature selection; minimal preprocessing needed. | Computationally intensive; interpretation can be challenging with correlated predictors. | Moderate - High |
Linear / Logistic Regression served as a baseline interpretable model, Random Forest captured complex nonlinear interactions, and MARS modeled adaptive spline-based relationships between predictors and cancer avoidance. By comparing these models, this study aimed to identify the most effective approach to predict health information avoidance while maintaining interpretability.
Method
Data Source
This study used the de-identified Health Avoiders dataset provided through Cloud Research in collaboration with Dr. Heather Orom (University at Buffalo). The dataset includes sociodemographic, psychological, and behavioral variables collected by Cloud Research. All analyses were conducted in R, and reproducibility was ensured through a README file and a Quarto documentation workflow.
Outcome Variable
The outcome variable, Cancer_Avoidance_Mean, represents the average score across 8 items (Avoid_Cancer_)measuring participants’ avoidance of cancer-related health information. Because this variable exhibited non-normality, both logarithmic and square-root transformations were tested; however, these transformations did not improve model performance or interpretability, so the untransformed variable was retained for analysis.
Predictor Models
Heather Orom (University at Buffalo) developed a theoretical framework to group predictors as a health psychologist studying health information avoidance.
Demographic Model (
demo_data) —Ethnicity,Political_Party,Gender4,Job_Classification,Education_Level,Age,Income,Race, andMacArthur_Numeric.Media Use Model (
media_data) —Social_Media_Usage,AI_Use,Video_Games_Hours,Listening_Podcasts,Facebook_Usage_cat,TikTok_Use,X_Twitter_Usage,Social_Media_type, andInfluencer_Following.Health Condition Model (
health_condition_data) —Stressful_Events_Recent,Current_Depression,Anxiety_Severity_num,PTSD5_Score,Health_Depression_Severity_num, andStress_TotalScore.Health Behavior Model (
health_behavior_data) —Fast_Food_Consumption,Meditation_group,Physical_Activity_Guidelines,Cigarette_Smoking_num,Supplement_Consumption_Reason_num,Diet_Type, andSupplement_Consumption.Other Factors Model (
other_data) —Home_Ownership,Voter_Registration,Climate_Change_Belief, andMental_Health_of_Partner.
Variable Scaling and Scoring
Several psychological and health condition variables were standardized using validated scales:
PC-PTSD-5: a 5-item yes/no screen for post-traumatic stress.
GAD-7: a 7-item measure of anxiety severity (0–3 scale).
PHQ-9: a 9-item measure of depression severity (0–3 scale).
Life Events Checklist: summed to represent cumulative stress exposure.
Composite averages were computed for each domain, resulting in 4 continuous variables: PTSD5_Score, Anxiety_Severity_num, Health_Depression_Severity_num, and Stress_TotalScore.
Analysis Plan
Predictive modeling was conducted in R using the tidymodels package. Due to the non-normal outcome variable, two complimentary approaches were applied:
Regression Models: predicted the continuous outcome
Cancer_Avoidance_Meanwere fitted using Linear Regression (baseline), Random Forest, and Multivariate Adaptive Regression Splines (MARS). Model performance was assessed using root mean squared error (RMSE), mean absolute error (MAE), and the correlation between predicted and observed values.Classification Models: predicted the binary outcome
Cancer_Avoiders01(0 = non-avoider, 1 = avoider) were fitted using Logistic Regression (baseline) and Random Forest. Model performance was assessed using area under the curve (AUC) and accuracy with basic calibration applied when class imbalance occurred.
Some models used cross-validation, and Linear / Logistic Regression served as the baseline for comparison.
Results
All analyses were conducted in Posit Cloud, a collaborative online platform for writing R scripts and developing Quarto markdown documents.
Overview of Model Performance
Table 2 summarizes the predictive performance across all modeling approaches. Consistent with the study’s central finding, all models demonstrated statistically significant associations but no meaningful predictive power.
Table 2. Summary of Model Performance Across All Approaches
| Domain | Model Type | Key Metric | Value | Interpretation |
|---|---|---|---|---|
| Demographics | Regression (RF) | Correlation | 0.109 | No predictive relationship |
| Demographics | Regression (MARS) | Cross-val R² | 0.017 | Poor generalization |
| Demographics | Classification (RF) | ROC AUC | 0.449 | Below chance performance |
| Media Usage | Regression (RF) | Correlation | 0.021 | No predictive relationship |
| Media Usage | Classification (RF) | ROC AUC | 0.574 | Marginally above chance |
| Health Condition | Regression (RF) | Correlation | 0.103 | No predictive relationship |
| Health Condition | Regression (MARS) | Cross-val R² | -0.003 | Negative (no pattern) |
| Health Condition | Classification (RF) | ROC AUC | 0.446 | Below chance performance |
| Health Behavior | Regression (RF) | Correlation | 0.060 | No predictive relationship |
| Health Behavior | Classification (RF) | ROC AUC | 0.472 | Below chance performance |
| Other Factors | Regression (RF) | Correlation | 0.145 | Weak relationship |
| Other Factors | Classification (RF) | ROC AUC | 0.420 | Below chance performance |
| Full Model | Regression (RF) | Correlation | 0.300 | Weak relationship |
| Full Model | Classification (RF) | ROC AUC | 0.230 | Worse than individual models |
Note: RF = Random Forest; MARS = Multivariate Adaptive Regression Splines. Classification models showed high accuracy (94-96%) due to class imbalance but uniformly poor discrimination (AUC < 0.60). All R² values were < 0.03.
The following sections detail findings for each predictor domain, focusing on the strongest associations identified despite their limited predictive utility.
Overview of Findings
This analysis examined five predictor domains: demographic, health status, health behaviors, media usage, and other attitudinal variables. In all modeling approaches, predictors were statistically significant with cancer avoidance behavior, but their predictive power was limited. The following paragraphs detail the findings for each domain.
Demographic Predictors
Linear regression analysis showed a significant negative association between socioeconomic status (MacArthur numerical score) and cancer avoidance behavior (β = -0.025, p = 1.89e-10), but the model explained only 0.5% of the variance (R² = 0.005). Age analysis identified a threshold of 35 years, with older individuals exhibiting significantly higher cancer avoidance scores (β = 0.063, p = 6.45e-06).
In both age groups (under 35 and over 35), Democrats had lower cancer avoidance scores than Republicans: under 35 (β = -0.165, p = 8.97e-08) and over 35 (β = -0.225, p = 2.67e-15). MARS analysis confirmed that party affiliation was the most important factor (importance = 100). Although the classification model showed high accuracy (95.5%), its discrimination was poor (ROC AUC = 0.449), indicating statistically significant but not practically predictive effects.
Health Condition Predictors
Predictive relationships among health condition variables were weak. The total stress score showed no significant trend (β = 0.005, p = 0.073, R² = 0.0005). While the MARS model identified anxiety severity as the most important factor, cross-validation performance was negative (CVRSq = -0.003), suggesting no generalizable pattern. The classification model also failed to meaningfully differentiate cancer avoiders.
Health Behavior Predictors
Cigarette smoking demonstrated a significant positive relationship with cancer avoidance. Smokers had higher cancer avoidance scores (β = 0.186, p < 2e-16, R² = 0.011) and 1.87 times higher odds of being cancer avoiders (β = 0.627, p = 1.05e-06). Approximately 8–9% of smokers were cancer avoiders compared to 4–5% of non-smokers. Despite statistical significance, the overall predictive power remained limited.
Media Usage Predictors
Media usage variables showed minimal predictive value (overall correlation = 0.021). Influencer following was not significantly associated with cancer avoidance (β = -0.011, p = 0.613, R² = 0.00008). Facebook usage showed a statistically significant but weak effect (β = 0.236, p = 0.039), with nearly equal proportions of cancer avoiders among users (5%) and non-users (4%). These effects are statistically significant but not practically meaningful.
Other Predictors: Climate Change Beliefs
Climate change beliefs were statistically associated with cancer avoidance (R² = 0.024). Individuals who strongly believe in human-caused climate change had lower cancer avoidance scores (β = -0.271, p = 3.32e-13) and 0.29 times the odds of being cancer avoiders compared to climate deniers (β = -1.236, p = 1.87e-10). Climate deniers showed 12–13% cancer avoiders compared to 3–4% among strong believers. Despite statistical significance, model differentiation remained poor (ROC AUC = 0.42).
Comprehensive Model: Combining Major Predictors
Random forest models incorporating all main predictors showed modest improvement. The regression model achieved a correlation of 0.30 (RMSE = 0.62, MAE = 0.52). The classification model achieved 95% accuracy but failed to meaningfully differentiate cancer avoiders (ROC AUC = 0.23).
Across all analyses, predictors were statistically significant but not practically predictive (R² < 0.03, ROC AUC < 0.50), indicating that the measured variables have limited explanatory power for cancer avoidance behavior.
Data Preprocessing and Descriptive Statistics
Predictor Distributions
Predictors
Outcome Distribution and Item Correlations
The outcome variable,
Cancer_Avoidance_Mean, was computed as the average of 8 items measuring participants’ avoidance of cancer-related health information. Four of these itemsAvoid_Cancer3,Avoid_Cancer5,Avoid_Cancer7,Avoid_Cancer8were reverse-coded to ensure higher scores consistently reflected greater avoidance. Reverse scoring was calculated using the formula:- Reversed score=(Max+Min)−Original score
Correlation
Avoid_Cancer1 Avoid_Cancer2 Avoid_Cancer3 Avoid_Cancer4
Avoid_Cancer1 1.0000000 0.6630162 0.6192841 0.6072410
Avoid_Cancer2 0.6630162 1.0000000 0.6141968 0.5869138
Avoid_Cancer3 0.6192841 0.6141968 1.0000000 0.5556186
Avoid_Cancer4 0.6072410 0.5869138 0.5556186 1.0000000
Avoid_Cancer5 0.6338078 0.6208662 0.7444480 0.5385535
Avoid_Cancer6 0.6840374 0.5089422 0.4899384 0.5936301
Avoid_Cancer7 0.4878094 0.4908217 0.5812488 0.4152808
Avoid_Cancer8 0.5700417 0.5459774 0.7130865 0.5062153
Avoid_Cancer5 Avoid_Cancer6 Avoid_Cancer7 Avoid_Cancer8
Avoid_Cancer1 0.6338078 0.6840374 0.4878094 0.5700417
Avoid_Cancer2 0.6208662 0.5089422 0.4908217 0.5459774
Avoid_Cancer3 0.7444480 0.4899384 0.5812488 0.7130865
Avoid_Cancer4 0.5385535 0.5936301 0.4152808 0.5062153
Avoid_Cancer5 1.0000000 0.4917697 0.5572648 0.7388645
Avoid_Cancer6 0.4917697 1.0000000 0.3526875 0.4478447
Avoid_Cancer7 0.5572648 0.3526875 1.0000000 0.5192653
Avoid_Cancer8 0.7388645 0.4478447 0.5192653 1.0000000
- All the 8 of the cancer items are moderate to high correlate with each other, so we can add them up and get the average score.
Alpha
Reliability analysis
Call: psych::alpha(x = selectdata[, c("Avoid_Cancer1", "Avoid_Cancer2",
"Avoid_Cancer3", "Avoid_Cancer4", "Avoid_Cancer5", "Avoid_Cancer6",
"Avoid_Cancer7", "Avoid_Cancer8")])
raw_alpha std.alpha G6(smc) average_r S/N ase mean sd median_r
0.91 0.91 0.91 0.57 10 0.0013 1.7 0.65 0.56
95% confidence boundaries
lower alpha upper
Feldt 0.91 0.91 0.91
Duhachek 0.91 0.91 0.91
Reliability if an item is dropped:
raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
Avoid_Cancer1 0.89 0.90 0.89 0.55 8.7 0.0016 0.0098 0.55
Avoid_Cancer2 0.90 0.90 0.90 0.56 9.1 0.0015 0.0107 0.56
Avoid_Cancer3 0.89 0.90 0.89 0.55 8.6 0.0015 0.0084 0.55
Avoid_Cancer4 0.90 0.90 0.90 0.58 9.5 0.0014 0.0103 0.57
Avoid_Cancer5 0.89 0.90 0.89 0.55 8.6 0.0015 0.0079 0.56
Avoid_Cancer6 0.91 0.91 0.90 0.59 9.9 0.0013 0.0070 0.58
Avoid_Cancer7 0.91 0.91 0.91 0.59 10.2 0.0014 0.0070 0.59
Avoid_Cancer8 0.90 0.90 0.90 0.56 9.0 0.0015 0.0083 0.58
Item statistics
n raw.r std.r r.cor r.drop mean sd
Avoid_Cancer1 11155 0.85 0.84 0.82 0.78 1.8 0.90
Avoid_Cancer2 11155 0.79 0.80 0.76 0.73 1.6 0.77
Avoid_Cancer3 11155 0.83 0.84 0.83 0.78 1.6 0.75
Avoid_Cancer4 11155 0.78 0.76 0.71 0.69 1.9 0.93
Avoid_Cancer5 11155 0.83 0.84 0.83 0.78 1.6 0.76
Avoid_Cancer6 11155 0.75 0.72 0.68 0.65 2.3 1.02
Avoid_Cancer7 11155 0.67 0.70 0.63 0.59 1.4 0.59
Avoid_Cancer8 11155 0.79 0.80 0.77 0.72 1.7 0.83
Non missing response frequency for each item
0 1 2 3 4 5 miss
Avoid_Cancer1 0 0.45 0.32 0.18 0.05 0 0
Avoid_Cancer2 0 0.56 0.32 0.10 0.03 0 0
Avoid_Cancer3 0 0.54 0.35 0.09 0.02 0 0
Avoid_Cancer4 0 0.43 0.27 0.26 0.05 0 0
Avoid_Cancer5 0 0.50 0.38 0.10 0.03 0 0
Avoid_Cancer6 0 0.29 0.23 0.36 0.12 0 0
Avoid_Cancer7 0 0.69 0.27 0.03 0.01 0 0
Avoid_Cancer8 0 0.53 0.31 0.12 0.04 0 0
Based on the analysis, the alpha does not improve for the measure if any of the items are dropped. All final alphas will be equal or lower than the 0.89 raw alpha and 0.9 standardized alpha.
The internal consistency of the
Information Avoidance – Canceritems was high (std.alpha = 0.895), suggesting that the items measured a coherent construct. The average inter-item correlation was (average_r = 0.55), indicating moderately strong associations among the items without redundancy. Reliability estimates (G6 = 0.894) further supported this consistency.
Percentage of participants with Cancer_Avoidance_Mean >= 2.5
[1] 14.4061
Matrix
| Predictor | Correlation |
|---|---|
| Age | 0.03 |
| Republican | 0.10 |
| Democrat | -0.09 |
| Independent | 0.02 |
| Something_else | 0.01 |
| Prefer_not_to_say | NA |
| Gender4 | -0.02 |
| Education_Level | -0.06 |
| Income | -0.05 |
| MacArthur_Numeric | -0.08 |
| Political_Party_Group | -0.03 |
| Race2 | 0.07 |
| Anxiety_Total_Score | 0.09 |
| PTSD5_Score | 0.09 |
| Health_Total_Score | 0.09 |
| StressEvents_Count | 0.02 |
| Stress_TotalScore | 0.02 |
Descriptive tables for categorical variables
| Variable | Value | n | percent |
|---|---|---|---|
| AI_Use | Missing | 41 | 0.4% |
| AI_Use | No | 2262 | 20.3% |
| AI_Use | Yes | 8852 | 79.4% |
| AgeBand | 35+ | 6275 | 56.3% |
| AgeBand | Under 35 | 4880 | 43.7% |
| AgeGroup | 18–24 | 1318 | 11.8% |
| AgeGroup | 25–34 | 3562 | 31.9% |
| AgeGroup | 35–44 | 3089 | 27.7% |
| AgeGroup | 45–54 | 1738 | 15.6% |
| AgeGroup | 55–64 | 962 | 8.6% |
| AgeGroup | 65+ | 486 | 4.4% |
| Anxiety_Feeling_Afraid | Missing | 42 | 0.4% |
| Anxiety_Feeling_Afraid | More than half the days | 974 | 8.7% |
| Anxiety_Feeling_Afraid | Nearly every day | 678 | 6.1% |
| Anxiety_Feeling_Afraid | Not at all | 5839 | 52.3% |
REGRESSION MODELS
(continuous outcome Cancer_Avoidance_Mean)
Random Forest Analysis (tidymodel)
Demographic Model
| Metric | Value |
|---|---|
| RMSE | 0.643 |
| MAE | 0.529 |
| R-squared | 0.012 |
| Correlation | 0.107 |
RMSE (~0.63) and MAE (~0.53) are pretty close, meaning errors aren’t heavily dominated by outliers.
Correlation = 0.109 is extremely low. It basically means that predicted values do not track the actual values at all. So the model is not capturing the pattern in the test data.
If predictors have very weak relationships with the outcome, the model will predicts values near the mean of the training set.
The random forest model shows that
MacArthur_Numericis the most predict variable.
| Term | Estimate | Std Error | Statistic | p value |
|---|---|---|---|---|
| Intercept | 1.862 | 0.022 | 86.363 | 0e+00 |
| MacArthur Numeric | -0.026 | 0.004 | -6.451 | 1e-10 |
- The linear regression analysis shows a statistically significant negative relationship between MacArthur Scale scores and cancer avoidance mean (\(\beta\) = -0.025, p = 1.89e-10). For each one-unit increase in the MacArthur score (indicating higher subjective socioeconomic status), cancer avoidance mean scores decreased by 0.025 units.
The scatter plot displays a slight negative trend (indicated by the red regression line), but the data points are widely dispersed across all MacArthur score levels.
This visual pattern confirms the weak correlation, showing that while a statistical relationship exists, MacArthur score (subjective socioeconomic status) is not a strong predictor of cancer avoidance behaviors on its own.
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 1.699 | 0.019 | 87.746 | 0.000 |
| AgeGroup25–34 | -0.003 | 0.023 | -0.153 | 0.879 |
| AgeGroup35–44 | 0.062 | 0.024 | 2.623 | 0.009 |
| AgeGroup45–54 | 0.078 | 0.027 | 2.909 | 0.004 |
| AgeGroup55–64 | 0.065 | 0.032 | 2.040 | 0.041 |
| AgeGroup65+ | -0.037 | 0.047 | -0.796 | 0.426 |
The initial analysis examined cancer avoidance scores across six age categories, revealing that middle-aged adults (35-64) showed significantly higher cancer avoidance behaviors compared to the youngest group (18-24). However, the model explains only 0.33% of the variance (\(R^2\)= 0.003), indicating weak predictive power despite statistical significance.
So we break the
AgeGroupintoAgeBandwith only 2 categories “above 35” and “under 35” to see if the result will be better.
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 1.69667 | 0.01024 | 165.71770 | 0e+00 |
| AgeBand35+ | 0.06408 | 0.01411 | 4.54045 | 1e-05 |
Individuals aged 35 and older had cancer avoidance scores 0.063 units higher than those under 35 (p = 6.45e-06), a highly significant difference. The model remains weak in explanatory power (\(R^2\) = 0.0025), but the clear age threshold at 35 years provides a meaningful distinction for further investigation.
Now, we wanted to see if other predictors would influence the
AgeBandfor under 35 and over 35, so we created two multivariable linear regression models (Age_under35 and Age_over35) predicting Cancer Avoidance Mean.
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 1.927 | 0.046 | 42.108 | 0.000 |
| Education_Level | -0.009 | 0.008 | -1.190 | 0.234 |
| Income | 0.003 | 0.006 | 0.484 | 0.628 |
| MacArthur_Numeric | -0.013 | 0.006 | -2.295 | 0.022 |
| Political_PartyDemocrat | -0.169 | 0.031 | -5.477 | 0.000 |
| Political_PartyIndependent | -0.119 | 0.033 | -3.542 | 0.000 |
| Political_PartySomething else | -0.154 | 0.040 | -3.805 | 0.000 |
| Political_PartyPrefer not to say | -0.150 | 0.050 | -2.990 | 0.003 |
For individuals under 35, political party affiliation emerged as the strongest predictor of cancer avoidance. Democrats showed 0.165 lower scores than Republicans (p = 8.97e-08), Independents showed 0.120 lower scores (p = 0.0003), and those selecting “Something else” showed 0.155 lower scores (p = 0.0001).
But the overall model explaining less than 1% of variance (\(R^2\) = 0.0097).
- The bar plots shows predicted cancer avoidance means across political affiliations, stratified by age. For those under 35, Republicans show the highest cancer avoidance behaviors (~1.80), while all other political groups cluster around 1.65, with error bars indicating moderate variability.
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 2.160 | 0.044 | 48.839 | 0.000 |
| Education_Level | -0.011 | 0.008 | -1.349 | 0.178 |
| Income | -0.016 | 0.007 | -2.286 | 0.022 |
| MacArthur_Numeric | -0.029 | 0.007 | -4.499 | 0.000 |
| Political_PartyDemocrat | -0.225 | 0.028 | -7.935 | 0.000 |
| Political_PartyIndependent | -0.131 | 0.030 | -4.368 | 0.000 |
| Political_PartySomething else | -0.138 | 0.041 | -3.397 | 0.001 |
| Political_PartyPrefer not to say | -0.093 | 0.057 | -1.632 | 0.103 |
For individuals aged 35 and older, political party affiliation showed even stronger effects than in the younger cohort. Democrats had 0.225 lower cancer avoidance scores than Republicans (p = 2.67e-15), Independents had 0.131 lower scores (p = 1.28e-05), and “Something else” respondents had 0.138 lower scores (p = 0.0007).
Additionally, both income (\(\beta\) = -0.016, p = 0.022) and MacArthur score (\(\beta\) = -0.029, p = 7.00e-06) showed significant negative associations, with this model explaining 2.8% of the variance (\(R^2\) = 0.028), the highest among all models tested.
The bar plots shows predicted cancer avoidance means across political affiliations, stratified by age. For those 35 and older, Republicans maintain the highest scores (~1.93), while Democrats show notably lower scores (~1.68), and other groups fall in between.
There has been a consistent gap between Republicans and Democrats in the two age groups. Republicans exhibited higher levels of cancer avoidance scores, suggesting that political party is strongly associated with these avoidance behaviors regardless of age, although this effect is more pronounced in older adults.
Media Use Model
| Metric | Value |
|---|---|
| RMSE | 0.664 |
| MAE | 0.551 |
| R-squared | 0.000 |
| Correlation | 0.021 |
RMSE (~0.66) and MAE (~0.55) are pretty close, meaning errors aren’t heavily dominated by outliers.
Correlation = 0.021 is extremely low. It basically means that predicted values do not track the actual values at all. So the model is not capturing the pattern in the test data.
If predictors have very weak relationships with the outcome, the model will predicts values near the mean of the training set.
The random forest model shows that
Influencer_Followingis the most predictive variable.
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 1.752 | 0.020 | 85.664 | 0.000 |
| Influencer_FollowingUnsure | -0.038 | 0.058 | -0.656 | 0.512 |
| Influencer_FollowingYes | -0.011 | 0.022 | -0.506 | 0.613 |
- The linear regression analysis shows a none meaningful relationship between Influencer_Following scores and cancer avoidance mean (p > 0.05).
- The boxplot examining cancer avoidance scores across influencer following categories (No, Unsure, Yes) reveals remarkably similar distributions across all three groups. This visual pattern strongly supports the conclusion that influencer following status has no substantial relationship with cancer avoidance score.
Health Condition Model
| Metric | Value |
|---|---|
| RMSE | 0.653 |
| MAE | 0.548 |
| R-squared | 0.011 |
| Correlation | 0.103 |
RMSE (~0.65) and MAE (~0.55) are pretty close, meaning errors aren’t heavily dominated by outliers.
Correlation = 0.103 is extremely low. It basically means that predicted values do not track the actual values at all. So the model is not capturing the pattern in the test data.
If predictors have very weak relationships with the outcome, the model will predicts values near the mean of the training set.
The random forest model shows that
Stress_TotalScoreis the most predict variable.
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 1.727 | 0.014 | 121.214 | 0.000 |
| Stress_TotalScore | 0.005 | 0.003 | 1.794 | 0.073 |
- The linear regression analysis shows a none meaningful relationship between stress total scores and cancer avoidance mean (p > 0.05).
- The plot displays predicted cancer avoidance scores across the range of stress total scores (0-18), showing a slight positive trend indicated by the red regression line. The broad gray confidence band around the regression line reflects the considerable uncertainty in these predictions. While higher stress levels may be weakly associated with slightly higher cancer avoidance behaviors, but this relationship is not statistically significant.
Health Behavior Model
| Metric | Value |
|---|---|
| RMSE | 0.671 |
| MAE | 0.556 |
| R-squared | 0.004 |
| Correlation | 0.060 |
RMSE (~0.67) and MAE (~0.56) are pretty close, meaning errors aren’t heavily dominated by outliers.
Correlation = 0.06 is extremely low. It basically means that predicted values do not track the actual values at all. So the model is not capturing the pattern in the test data.
If predictors have very weak relationships with the outcome, the model will predicts values near the mean of the training set.
The random forest model shows that number of
Cigarette_Smokingis the most predict variable.
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 1.708 | 0.008 | 201.753 | 0 |
| Cigarette_Smoking_num1 | 0.186 | 0.021 | 8.860 | 0 |
The linear regression analysis shows a statistically significant positive relationship between cigarette smoking and cancer avoidance behaviors (\(\beta\) = 0.186, p < 2e-16). For every one-unit increase in cigarette smoking status (from non-smoker to smoker), the predicted cancer avoidance score increases by 0.186 units, meaning individuals who smoke tend to have slightly higher cancer avoidance scores.
While this relationship is conventionally significant at the 0.05 level, the model explains only 1.1% of the variance (\(R^2\) = 0.011), indicating that smoking status alone provides minimal predictive power for cancer avoidance scores.
- Look at the box plot, smokers show higher cancer avoidance scores on average compared to non-smokers. This could mean that smokers are more aware of or concerned about cancer risks, possibly leading them to engage in other cancer-preventive behaviors. However, the low variance reminds us that while this pattern is statistically significant, smoking status explains only a small fraction of the variation in cancer avoidance scores.
Other Model
| Metric | Value |
|---|---|
| RMSE | 0.631 |
| MAE | 0.525 |
| R-squared | 0.021 |
| Correlation | 0.145 |
RMSE (~0.63) and MAE (~0.52) are pretty close, meaning errors aren’t heavily dominated by outliers.
Correlation = 0.145 is extremely low. It basically means that predicted values do not track the actual values at all. So the model is not capturing the pattern in the test data.
If predictors have very weak relationships with the outcome, the model will predicts values near the mean of the training set.
The random forest model shows that
Climate_Change_Belief(Strongly believe climate change is occurring and is primarily caused by human activities) is the most predict variable.
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 1.922 | 0.036 | 53.841 | 0.000 |
| Climate_Change_BeliefSomewhat skeptical about the impact of human activities on climate change, believing that climate change is a natural cycle. | -0.027 | 0.046 | -0.583 | 0.560 |
| Climate_Change_BeliefUncertain about the causes and extent of climate change. | -0.064 | 0.048 | -1.337 | 0.181 |
| Climate_Change_BeliefNo opinion on the matter. | 0.069 | 0.066 | 1.046 | 0.295 |
| Climate_Change_BeliefSomewhat believe climate change is occurring and is influenced by human activities, but natural factors also play a significant role. | -0.124 | 0.038 | -3.221 | 0.001 |
| Climate_Change_BeliefStrongly believe climate change is occurring and is primarily caused by human activities. | -0.271 | 0.037 | -7.294 | 0.000 |
- The linear regression shows significant associations between climate change beliefs and cancer avoidance scores. Individuals who strongly believe in human-caused climate change showed significantly lower cancer avoidance scores (\(\beta\) = -0.271, p = 3.32e-13), meaning their scores were 0.271 units lower on average. The model explains 2.4% of the variance (\(R^2\) = 0.024), indicating a weak but statistically significant relationship.
- The boxplot displays cancer avoidance behaviors across six climate change belief categories, revealing a clear gradient pattern. Climate change deniers and no opinion show higher median cancer avoidance scores around 1.9 - 2, with relatively compact distributions. Which means Individuals who strongly believe in human-caused climate change showed significantly lower cancer avoidance scores. However, the low variance reminds us that while this pattern is statistically significant, undefined explains only a small fraction of the variation in cancer avoidance scores.
Multivariate Adaptive Regression Splines (MARS)
Demographic
| Term | Coefficient |
|---|---|
| (Intercept) | 1.748 |
| Political_PartyDemocrat | -0.113 |
| h(MacArthur_Numeric-9) | 0.275 |
| h(9-MacArthur_Numeric) | 0.019 |
| h(Age-56) | -0.014 |
| h(56-Age) | -0.003 |
| RaceWhite | 0.080 |
| h(Income-2) | -0.017 |
| Job_ClassificationProfessional | -0.093 |
| Job_ClassificationBlue Collar | 0.127 |
| Job_ClassificationUnemployed/Student/Parent | -0.074 |
| Variable | N Subsets | GCV | RSS |
|---|---|---|---|
| Political_PartyDemocrat | 6 | 1 | 10 |
| MacArthur_Numeric | 37 | 1 | 9 |
| RaceWhite | 33 | 1 | 8 |
| Job_ClassificationProfessional | 17 | 1 | 7 |
| Job_ClassificationUnemployed/Student/Parent | 18 | 1 | 6 |
| Job_ClassificationBlue Collar | 16 | 1 | 5 |
| Age | 21 | 1 | 4 |
| Income | 22 | 1 | 2 |
| Metric | Value |
|---|---|
| Selected Terms | 11.000 |
| R-squared | 0.028 |
| GRSq | 0.021 |
Political_PartyDemocratis the most influential predictor (baseline = 100).MacArthur_Numericis about 75% as important.RaceWhiteis about 63%.
| Term | Coefficient |
|---|---|
| (Intercept) | 1.519 |
| Political_PartyDemocrat | -0.124 |
| h(MacArthur_Numeric-9) | 0.917 |
| h(56-Age) | 0.009 |
| h(Age-26)*h(9-MacArthur_Numeric) | 0.001 |
| RaceWhite | 0.184 |
| Job_ClassificationProfessional*h(9-MacArthur_Numeric) | -0.024 |
| h(Income-2)*RaceWhite | -0.039 |
| EthnicityNo, not of Hispanic, Latino, or Spanish origin*h(MacArthur_Numeric-9) | -0.957 |
| Job_ClassificationBlue Collar*h(9-MacArthur_Numeric) | 0.038 |
| Metric | Value |
|---|---|
| Selected Terms | 17.0000 |
| R-squared | 0.0418 |
| GRSq | 0.0281 |
| Terms | Cancer_Avoidance_Mean | mean |
|---|---|---|
| fold7 | 0.0062 | 0.0062 |
| fold8 | 0.0258 | 0.0258 |
| fold9 | 0.0267 | 0.0267 |
| fold10 | -0.0009 | -0.0009 |
| mean | 0.0175 | 0.0175 |
Interactions appear as products (such as):
Ethnicity Prefer not to say * Political_Party Democrat = an interaction term between ethnicity and political party.
Selected model at 17 terms
GRSq/RSq converging around 0.03-0.04
Mean out-of-fold RSq near zero (poor predictive performance)
High cross-validation error (CVRSq = 0.017, sd = 0.019)
This suggests demographic variables don’t explain Cancer_Avoidance_Mean very well.
The model selection plot shows that both RSq (training fit) and GRSq (generalized R-square) values remain low. This indicates that demographic characteristics provide limited explanatory power for cancer avoidance mean, and adding nonlinear terms does not meaningfully improve predictive accuracy.
Demographics status explain very little of the variation in cancer avoidance. Where effects appear, they are narrow interactions, such as between political affiliation and ethnicity, or between Job_Classification Blue Collar and MacArthur_Numeric > 3.
Health condition
| Term | Coefficient |
|---|---|
| (Intercept) | 1.760 |
| h(Anxiety_Severity_num-3) | 0.144 |
| h(1-PTSD5_Score) | -0.088 |
| Variable | N Subsets | GCV | RSS |
|---|---|---|---|
| Anxiety_Severity_num | 4 | 1 | 2 |
| PTSD5_Score | 5 | 1 | 1 |
| Metric | Value |
|---|---|
| Selected Terms | 3.000 |
| R-squared | 0.009 |
| GRSq | 0.007 |
Anxiety_Severity_numis the most influential predictor (baseline = 100).PTSD5_Scoreis about 61%
[,1]
Cancer_Avoidance_Mean 0.1452411
Interactions appear as products (such as):
h(Anxiety_Severity_num-3) = 0.191: When anxiety severity score > 3, cancer avoidance increases by 0.191 units.
h(1-PTSD5_Score) = -0.088: When PTSD score < 1, cancer avoidance decreases by 0.088 units, meaning higher PTSD scores are associated with slightly higher avoidance.
Selected model at 5 terms
GRSq/RSq barely above zero (GRSq = 0.0073, RSq = 0.012)
Mean out-of-fold RSq is negative - confirms no predictive power
CVRSq = -0.003, MaxErr = 2.35, This suggests health condition variables don’t explain Cancer_Avoidance_Mean very well.
Predictive power is weak: training R-square ~ 0.065, The model explains about 6.5% of the variance in Cancer_Avoidance_Mean on the training data and is a weak fit.
The correlation between Cancer_Avoidance_Mean and health condition is also very week (0.1452411)
The model selection plot shows that both RSq (training fit) and GRSq (generalized R-square) initially increase, indicating that adding the first few hinge functions improves model performance. However, after approximately 4 terms, GRSq reaches its maximum and then slightly declines, suggesting that additional terms provide minimal benefit and may cause overfitting.
The selected model at 4 terms uses 4 predictors and achieves a modest GRSq (0.01), indicating that although weak nonlinear relationships exist, the overall predictive power of the model remains low.
CLASSIFICATION MODEL
(binary outcome Cancer_Avoiders01)
Random Forest
Demographic Model
| Metric | Value |
|---|---|
| accuracy | 0.955 |
| kap | 0.000 |
| Truth | 0 | 1 |
|---|---|---|
| 0 | 2306 | 108 |
| 1 | 0 | 0 |
| Metric | Value |
|---|---|
| ROC AUC | 0.449 |
High accuracy (0.955) simply reflects predicting majority class.
ROC AUC = 0.449 , which means no better than random
The random forest model shows that
MacArthur_Numericis the most predict variable.
| Estimate | Std. Error | z value | Pr(>|z|) | |
|---|---|---|---|---|
| (Intercept) | -2.600 | 0.162 | -16.100 | 0.000 |
| MacArthur_Numeric | -0.108 | 0.032 | -3.427 | 0.001 |
The logistic regression shows a statistically significant negative relationship between MacArthur scores and the probability of being a cancer avoider (\(\beta\) = -0.108, p = 0.0006). For each one-unit increase in the MacArthur score, the log-odds of being a cancer avoider decrease by 0.108, meaning individuals with higher subjective socioeconomic status are less likely to be classified as cancer avoiders.
However, the model shows minimal improvement with only 11.8 point reduction in deviance over the null model (null deviance = 2785.6, residual deviance = 2773.8).
Area under the curve: 0.5511
- ROC AUC = 0.55, which means slightly better than random
The plot displays the predicted probability of being a cancer avoider across MacArthur scores ranging from 1 to 10. The red line shows a clear downward trend. The gray confidence band widens slightly at the extremes but remains relatively narrow, indicating consistent uncertainty across the score range. This visualization confirms the negative relationship identified in the logistic regression, higher socioeconomic status is associated with lower probability of being a cancer avoider.
However, the poor ROC AUC value (0.449) indicates that although the relationship is statistically significant, the model has little practical use in identifying cancer avoiders.
Media Usage Model
| Metric | Value |
|---|---|
| accuracy | 0.947 |
| kap | 0.000 |
| Truth | 0 | 1 |
|---|---|---|
| 0 | 1939 | 109 |
| 1 | 0 | 0 |
| Metric | Value |
|---|---|
| ROC AUC | 0.574 |
High accuracy (0.947) simply reflects predicting majority class.
ROC AUC = 0.574 , which means slightly better than random
The random forest model shows that
Facebook_Usageis the most predict variable.
| Estimate | Std. Error | z value | Pr(>|z|) | |
|---|---|---|---|---|
| (Intercept) | -3.104 | 0.087 | -35.680 | 0.000 |
| Facebook_Usage_cat1 | 0.236 | 0.114 | 2.067 | 0.039 |
The logistic regression shows a statistically significant positive relationship between Facebook usage and the probability of being a cancer avoider (\(\beta\) = 0.2358, p = 0.0388). Facebook users have odds of being a cancer avoider that are 1.27 times higher (\(e^0.2358\) = 1.266) than non-users, representing a 26.6% increase in odds.
However, the model shows minimal improvement over the null model, with only a 4.3 point reduction in deviance (null deviance = 2654.9, residual deviance = 2650.6), indicating that Facebook usage explains none of the variation in cancer avoider.
Area under the curve: 0.5291
- ROC AUC = 0.53, which means slightly better than random
- The stacked bar chart displays the percentage distribution of cancer avoiders and non-avoiders across Facebook usage groups (0 = No, 1 = Yes). Both groups show similar distributions, with cancer avoiders (red) representing a very small percentage at the bottom of each bar. While the logistic regression identified a statistically significant difference (p = 0.039), the visual similarity between the two bars reinforces that this difference has minimal value for predicting cancer avoider status.
Health Condition Model
| Metric | Value |
|---|---|
| accuracy | 0.948 |
| kap | 0.000 |
| Truth | 0 | 1 |
|---|---|---|
| 0 | 1895 | 103 |
| 1 | 0 | 0 |
| Metric | Value |
|---|---|
| ROC AUC | 0.446 |
High accuracy (0.948) simply reflects predicting majority class.
ROC AUC = 0.446, which means no better than random
The random forest model shows that
Stress_TotalScoreis the most predict variable.
| Estimate | Std. Error | z value | Pr(>|z|) | |
|---|---|---|---|---|
| (Intercept) | -3.009 | 0.099 | -30.260 | 0.000 |
| Stress_TotalScore | 0.015 | 0.019 | 0.761 | 0.447 |
- The logistic regression shows a none significant relationship between stress total score and the probability of being a cancer avoider (\(\beta\) = 0.01475, p = 0.447).
Area under the curve: 0.5127
- ROC AUC = 0.51, which means slightly better than random
- The plot displays the predicted probability of being a cancer avoider across stress total scores ranging from 1 to 18. The red line shows a very slight upward trend. The gray confidence band widens at higher stress scores, indicating increasing uncertainty in predictions for individuals with very high stress levels. Which demonstrates that stress total score provides essentially no predictive information for identifying cancer avoiders, consistent with the lack of statistical significance in the model.
Health Behavior Model
| Metric | Value |
|---|---|
| accuracy | 0.946 |
| kap | 0.000 |
| Truth | 0 | 1 |
|---|---|---|
| 0 | 1975 | 113 |
| 1 | 0 | 0 |
| Metric | Value |
|---|---|
| ROC AUC | 0.472 |
High accuracy (0.946) simply reflects predicting majority class.
ROC AUC = 0.472, which means no better than random
The random forest model shows that
Cigarette_Smokingis the most predict variable.
| Estimate | Std. Error | z value | Pr(>|z|) | |
|---|---|---|---|---|
| (Intercept) | -3.1004153 | 0.06452494 | -48.049874 | 0.00e+00 |
| Cigarette_Smoking_num1 | 0.6269373 | 0.12839376 | 4.882927 | 1.05e-06 |
The logistic regression shows a statistically significant positive relationship between smokers and the probability of being a cancer avoider (\(\beta\) = 0.62694, p = 1.05e-06). Smokers have odds of being a cancer avoider that are 1.87 times higher (\(e^0.62694\) = 1.872) compared to non-smokers, representing an 87.2% increase in odds
However, the model shows modest improvement over the null model, with a 21.7 point reduction in deviance (null deviance = 2709.8, residual deviance = 2688.1), indicating that Facebook usage explains modest of the variation in cancer avoider.
Area under the curve: 0.5509
- ROC AUC = 0.55, which means slightly better than random
The stacked bar chart displays the percentage distribution of cancer avoiders and non-avoiders across smoking status (0 = Non-smoker, 1 = Smoker). Smokers display a larger red segment at approximately 8-9% cancer avoiders. This visual pattern confirms the logistic regression finding, smokers are roughly twice as likely to be cancer avoiders compared to non-smokers.
However, it’s important to note that despite this significant association, cancer avoiders remain a minority in both groups, with over 90% of both smokers and non-smokers classified as non-avoiders.
Other Model
| Metric | Value |
|---|---|
| accuracy | 0.959 |
| kap | 0.000 |
| Truth | 0 | 1 |
|---|---|---|
| 0 | 2106 | 91 |
| 1 | 0 | 0 |
| Metric | Value |
|---|---|
| ROC AUC | 0.42 |
High accuracy (0.96) simply reflects predicting majority class.
ROC AUC = 0.42, which means no better than random
The random forest model shows that
Climate_Change_Belief(other) is the most predict variable.
| Estimate | Std. Error | z value | Pr(>|z|) | |
|---|---|---|---|---|
| (Intercept) | -2.02387 | 0.17492 | -11.57000 | 0.00000 |
| Climate_Change_BeliefSomewhat skeptical about the impact of human activities on climate change, believing that climate change is a natural cycle. | -0.48224 | 0.24799 | -1.94455 | 0.05183 |
| Climate_Change_BeliefUncertain about the causes and extent of climate change. | -0.68418 | 0.27068 | -2.52767 | 0.01148 |
| Climate_Change_BeliefNo opinion on the matter. | -0.19030 | 0.34048 | -0.55893 | 0.57621 |
| Climate_Change_BeliefSomewhat believe climate change is occurring and is influenced by human activities, but natural factors also play a significant role. | -1.06978 | 0.20643 | -5.18226 | 0.00000 |
| Climate_Change_BeliefStrongly believe climate change is occurring and is primarily caused by human activities. | -1.23578 | 0.19395 | -6.37182 | 0.00000 |
The logistic regression shows significant negative associations between accepting human caused climate change and the probability of being a cancer avoider. Individuals who strongly believe climate change is primarily caused by human activities have 71% lower odds (1 - \(e^-1.236\) = 0.71) of being cancer avoiders (\(\beta\) = -1.236, p = 1.87e-10) compared to climate dennier.
The model shows modest improvement over the null model with a 50.1 point deviance reduction (null deviance = 2781.4, residual deviance = 2731.3)
Area under the curve: 0.5879
- ROC AUC = 0.58, which means slightly better than random
- The stacked bar chart displays the percentage distribution of cancer avoiders across six climate change belief categories. This visual pattern demonstrates that individuals who deny about human-caused climate change are 3-4 times more likely to be cancer avoiders compared to those who strongly accept about human-caused climate change.
Comprehensive Model with All Predictors
| Metric | Value |
|---|---|
| RMSE | 0.620 |
| MAE | 0.516 |
| R-squared | 0.088 |
| Correlation | 0.296 |
RMSE (~0.62) and MAE (~0.52) are pretty close, meaning errors aren’t heavily dominated by outliers.
Correlation = 0.3 is low. But it imporve a little bit compare to all the regression models. It basically means that predicted values do not track the actual values at all. So the model is not capturing the pattern in the test data.
If predictors have very weak relationships with the outcome, the model will predicts values near the mean of the training set.
The random forest model shows that
Stress_TotalScoreis the most predict variable.
| Metric | Estimate |
|---|---|
| accuracy | 0.952 |
| kap | 0.000 |
High accuracy (0.95) simply reflects predicting majority class.
ROC AUC = 0.23, which means no better than random, and is worse than all the classification models.
The random forest model shows that
Stress_TotalScoreis the most predict variable.
Discussion
This analysis examined predictors of cancer avoidance scores across demographic, media-related, health, behavioral, and other domains using regression, classification, and MARS models. Across all approaches, the findings point to the same conclusion is that many predictors reached significance, but none meaningfully predicted cancer avoidance.
Regression models explained less than 3% of variance (\(R^2\) < 0.03), and the full model reached only a modest correlation of 0.30. Classification models showed high accuracy due to class imbalance, but poor distinction, with ROC AUC values between 0.23 and 0.49, at below chance. The full classification model performed worst (AUC = 0.23), reinforcing that adding strong predictors does not improve performance.
MARS models found some nonlinear patterns, but the negative cross-validated \(R^2\) shows they likely don’t hold up and reflect overfitting. Overall, the small predictive value suggests important factors are missing. The low number of cancer avoiders (4–5%) also makes prediction harder. The unexpected links with smoking and climate beliefs may come from unmeasured factors or differences in how people understand cancer avoidance.
Limitations
Cross-sectional design: Because the data were collected at one point in time, we cannot tell what causes what. Other unmeasured factors may also affect the results.
Self-reported outcome: People may interpret “cancer avoidance” differently, which means the score may not fully reflect their real behaviors.
Small effect sizes: Many predictors were statistically significant but had tiny effects, likely because the sample size was large.
Inconsistent samples across models: Missing data led to different subsets being used, limiting direct comparisons.
Random forest issues: The variable importance results were not consistent with simpler models and may exaggerate weak patterns.
Limited generalizability: Results may not generalize to other groups or contexts.
Future Directions
Future studies should: (1) use longitudinal data to better understand cause-and-effect; (2) explore factors that might explain or change the relationships found; (3) oversample cancer avoiders to address class imbalance; (4) include qualitative work to understand unexpected patterns; and (5) validate these findings in independent samples.
Acknowledgments
We thank the following people and organizations for their guidance, support, and resources in this project:
- Dr. Shane McCarty (Binghamton University) – Principal Investigator and mentor
- Dr. Heather Orom (University at Buffalo) – Principal Investigator and mentor
- Dr. Kargin Vladislav (Binghamton University) – Principal Investigator and mentor
- Cloud Research – Owner of the Health Avoiders dataset and provider of access to de-identified survey data
Appendix A: Detailed Statistical Output
Linear Regression
MacArthur Scale vs cancer-avoidance score
Show the code
MacArthur_cancer_linear <- lm(Cancer_Avoidance_Mean ~ MacArthur_Numeric, data = demo_data)
summary(MacArthur_cancer_linear)
Call:
lm(formula = Cancer_Avoidance_Mean ~ MacArthur_Numeric, data = demo_data)
Residuals:
Min 1Q Median 3Q Max
-0.83605 -0.58208 -0.08208 0.44371 2.34450
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.861837 0.021558 86.363 < 2e-16 ***
MacArthur_Numeric -0.025792 0.003998 -6.451 1.18e-10 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.6312 on 8043 degrees of freedom
Multiple R-squared: 0.005147, Adjusted R-squared: 0.005023
F-statistic: 41.61 on 1 and 8043 DF, p-value: 1.178e-10
Age vs cancer-avoidance score
Show the code
agegroup_cancer_linear <- lm(Cancer_Avoidance_Mean ~ AgeGroup, data = demo_data)
summary(age_cancer_linear)
Call:
lm(formula = Cancer_Avoidance_Mean ~ AgeBand, data = demo_data)
Residuals:
Min 1Q Median 3Q Max
-0.76075 -0.57167 -0.07167 0.42833 2.30333
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.69667 0.01024 165.72 < 2e-16 ***
AgeBand35+ 0.06408 0.01411 4.54 5.7e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.632 on 8043 degrees of freedom
Multiple R-squared: 0.002557, Adjusted R-squared: 0.002433
F-statistic: 20.62 on 1 and 8043 DF, p-value: 5.695e-06
Age vs cancer-avoidance score
Show the code
ageband_cancer_linear <- lm(Cancer_Avoidance_Mean ~ AgeBand, data = demo_data)
summary(age_cancer_linear)
Call:
lm(formula = Cancer_Avoidance_Mean ~ AgeBand, data = demo_data)
Residuals:
Min 1Q Median 3Q Max
-0.76075 -0.57167 -0.07167 0.42833 2.30333
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.69667 0.01024 165.72 < 2e-16 ***
AgeBand35+ 0.06408 0.01411 4.54 5.7e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.632 on 8043 degrees of freedom
Multiple R-squared: 0.002557, Adjusted R-squared: 0.002433
F-statistic: 20.62 on 1 and 8043 DF, p-value: 5.695e-06
Age over 35 vs cancer-avoidance score
Show the code
model_over35 <- lm(Cancer_Avoidance_Mean ~ Education_Level + Income +
MacArthur_Numeric + Political_Party,
data = dplyr::filter(demo_data, AgeBand == "35+"))
summary(model_over35)
Call:
lm(formula = Cancer_Avoidance_Mean ~ Education_Level + Income +
MacArthur_Numeric + Political_Party, data = dplyr::filter(demo_data,
AgeBand == "35+"))
Residuals:
Min 1Q Median 3Q Max
-1.09262 -0.59425 -0.05984 0.45032 2.45083
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.160298 0.044233 48.839 < 2e-16 ***
Education_Level -0.011298 0.008378 -1.349 0.177554
Income -0.015754 0.006892 -2.286 0.022306 *
MacArthur_Numeric -0.029331 0.006519 -4.499 7.00e-06 ***
Political_PartyDemocrat -0.225468 0.028414 -7.935 2.67e-15 ***
Political_PartyIndependent -0.131147 0.030024 -4.368 1.28e-05 ***
Political_PartySomething else -0.138479 0.040765 -3.397 0.000688 ***
Political_PartyPrefer not to say -0.093357 0.057189 -1.632 0.102659
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.6599 on 4226 degrees of freedom
Multiple R-squared: 0.02841, Adjusted R-squared: 0.0268
F-statistic: 17.65 on 7 and 4226 DF, p-value: < 2.2e-16
Influencer Following vs cancer-avoidance score
Show the code
influencer_cancer_linear <- lm(Cancer_Avoidance_Mean ~ Influencer_Following, data = media_data)
summary(influencer_cancer_linear)
Call:
lm(formula = Cancer_Avoidance_Mean ~ Influencer_Following, data = media_data)
Residuals:
Min 1Q Median 3Q Max
-0.7519 -0.6157 -0.1157 0.3843 2.2593
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.75190 0.02045 85.664 <2e-16 ***
Influencer_FollowingUnsure -0.03774 0.05751 -0.656 0.512
Influencer_FollowingYes -0.01121 0.02215 -0.506 0.613
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.6428 on 6821 degrees of freedom
Multiple R-squared: 7.698e-05, Adjusted R-squared: -0.0002162
F-statistic: 0.2626 on 2 and 6821 DF, p-value: 0.7691
Stress Score vs cancer-avoidance score
Show the code
stress_cancer_linear <- lm(Cancer_Avoidance_Mean ~ Stress_TotalScore, data = health_condition_data)
summary(stress_cancer_linear)
Call:
lm(formula = Cancer_Avoidance_Mean ~ Stress_TotalScore, data = health_condition_data)
Residuals:
Min 1Q Median 3Q Max
-0.8190 -0.6173 -0.1122 0.4629 2.2679
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.726935 0.014247 121.214 <2e-16 ***
Stress_TotalScore 0.005116 0.002852 1.794 0.0729 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.6563 on 6658 degrees of freedom
Multiple R-squared: 0.000483, Adjusted R-squared: 0.0003329
F-statistic: 3.217 on 1 and 6658 DF, p-value: 0.0729
Smoking vs cancer-avoidance score
Show the code
smoking_cancer_linear <- lm(Cancer_Avoidance_Mean ~ Cigarette_Smoking_num, data = health_behavior_data)
summary(smoking_cancer_linear)
Call:
lm(formula = Cancer_Avoidance_Mean ~ Cigarette_Smoking_num, data = health_behavior_data)
Residuals:
Min 1Q Median 3Q Max
-0.89432 -0.58335 -0.08335 0.41665 2.29165
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.708348 0.008468 201.75 <2e-16 ***
Cigarette_Smoking_num1 0.185977 0.020992 8.86 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.6463 on 6955 degrees of freedom
Multiple R-squared: 0.01116, Adjusted R-squared: 0.01102
F-statistic: 78.49 on 1 and 6955 DF, p-value: < 2.2e-16
Climate Change Belief vs cancer-avoidance score
Show the code
voter_cancer_linear <- lm(Cancer_Avoidance_Mean ~ Climate_Change_Belief, data = other_data)
summary(voter_cancer_linear)
Call:
lm(formula = Cancer_Avoidance_Mean ~ Climate_Change_Belief, data = other_data)
Residuals:
Min 1Q Median 3Q Max
-0.9943 -0.5506 -0.0506 0.4494 2.3484
Coefficients:
Estimate
(Intercept) 1.92232
Climate_Change_BeliefSomewhat skeptical about the impact of human activities on climate change, believing that climate change is a natural cycle. -0.02146
Climate_Change_BeliefUncertain about the causes and extent of climate change. -0.06294
Climate_Change_BeliefNo opinion on the matter. 0.07200
Climate_Change_BeliefSomewhat believe climate change is occurring and is influenced by human activities, but natural factors also play a significant role. -0.12172
Climate_Change_BeliefStrongly believe climate change is occurring and is primarily caused by human activities. -0.27071
Std. Error
(Intercept) 0.03574
Climate_Change_BeliefSomewhat skeptical about the impact of human activities on climate change, believing that climate change is a natural cycle. 0.04637
Climate_Change_BeliefUncertain about the causes and extent of climate change. 0.04785
Climate_Change_BeliefNo opinion on the matter. 0.06592
Climate_Change_BeliefSomewhat believe climate change is occurring and is influenced by human activities, but natural factors also play a significant role. 0.03846
Climate_Change_BeliefStrongly believe climate change is occurring and is primarily caused by human activities. 0.03713
t value
(Intercept) 53.786
Climate_Change_BeliefSomewhat skeptical about the impact of human activities on climate change, believing that climate change is a natural cycle. -0.463
Climate_Change_BeliefUncertain about the causes and extent of climate change. -1.315
Climate_Change_BeliefNo opinion on the matter. 1.092
Climate_Change_BeliefSomewhat believe climate change is occurring and is influenced by human activities, but natural factors also play a significant role. -3.165
Climate_Change_BeliefStrongly believe climate change is occurring and is primarily caused by human activities. -7.291
Pr(>|t|)
(Intercept) < 2e-16
Climate_Change_BeliefSomewhat skeptical about the impact of human activities on climate change, believing that climate change is a natural cycle. 0.64356
Climate_Change_BeliefUncertain about the causes and extent of climate change. 0.18841
Climate_Change_BeliefNo opinion on the matter. 0.27474
Climate_Change_BeliefSomewhat believe climate change is occurring and is influenced by human activities, but natural factors also play a significant role. 0.00156
Climate_Change_BeliefStrongly believe climate change is occurring and is primarily caused by human activities. 3.4e-13
(Intercept) ***
Climate_Change_BeliefSomewhat skeptical about the impact of human activities on climate change, believing that climate change is a natural cycle.
Climate_Change_BeliefUncertain about the causes and extent of climate change.
Climate_Change_BeliefNo opinion on the matter.
Climate_Change_BeliefSomewhat believe climate change is occurring and is influenced by human activities, but natural factors also play a significant role. **
Climate_Change_BeliefStrongly believe climate change is occurring and is primarily caused by human activities. ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.6363 on 7315 degrees of freedom
Multiple R-squared: 0.02481, Adjusted R-squared: 0.02415
F-statistic: 37.23 on 5 and 7315 DF, p-value: < 2.2e-16
Mars
Demographic model
Call: earth(formula=Cancer_Avoidance_Mean~Ethnicity+Political_Party+Ge...),
data=train)
coefficients
(Intercept) 1.74763211
Political_PartyDemocrat -0.11313097
Job_ClassificationBlue Collar 0.12744504
Job_ClassificationProfessional -0.09341490
Job_ClassificationUnemployed/Student/Parent -0.07383926
RaceWhite 0.08027776
h(56-Age) -0.00278826
h(Age-56) -0.01447206
h(Income-2) -0.01704592
h(9-MacArthur_Numeric) 0.01881144
h(MacArthur_Numeric-9) 0.27468651
Selected 11 of 14 terms, and 8 of 37 predictors
Termination condition: RSq changed by less than 0.001 at 14 terms
Importance: Political_PartyDemocrat, MacArthur_Numeric, RaceWhite, ...
Number of terms at each degree of interaction: 1 10 (additive model)
GCV 0.3877556 RSS 2167.197 GRSq 0.02086902 RSq 0.02781319
nsubsets gcv rss
Political_PartyDemocrat 10 100.0 100.0
MacArthur_Numeric 9 75.8 81.0
RaceWhite 8 63.7 71.0
Job_ClassificationProfessional 7 52.7 61.9
Job_ClassificationUnemployed/Student/Parent 6 46.6 55.8
Job_ClassificationBlue Collar 5 37.7 48.1
Age 4 32.5 42.3
Income 2 19.7 28.1
Cross validation demographic
Show the code
library(earth)
mars_model <- earth(
Cancer_Avoidance_Mean ~ Ethnicity + Political_Party + Gender4 + Job_Classification +
Education_Level + Age + Income + Race + MacArthur_Numeric,
data = train,
degree = 2, # allow up to 2-way interactions
nfold = 10, # 10-fold CV
keepxy = TRUE
)
summary(mars_model) Call: earth(formula=Cancer_Avoidance_Mean~Ethnicity+Political_Party+Ge...),
data=train, keepxy=TRUE, degree=2, nfold=10)
coefficients
(Intercept) 1.51853878
Political_PartyDemocrat -0.12354920
RaceWhite 0.18407526
h(56-Age) 0.00899561
h(MacArthur_Numeric-9) 0.91738017
Political_PartyDemocrat * Job_ClassificationIT 0.12512063
Job_ClassificationFreelance/Gig * RaceWhite 0.12053036
EthnicityNo, not of Hispanic, Latino, or Spanish origin * h(MacArthur_Numeric-9) -0.95687380
Political_PartyDemocrat * h(2-Education_Level) 0.37967090
h(3-Gender4) * RaceWhite 0.03914996
Job_ClassificationBlue Collar * h(9-MacArthur_Numeric) 0.03785557
Job_ClassificationProfessional * h(9-MacArthur_Numeric) -0.02389413
h(56-Age) * RaceWhite -0.00475175
h(Age-56) * RaceWhite -0.02318944
h(Income-2) * RaceWhite -0.03875043
h(56-Age) * h(5-Income) -0.00128766
h(Age-26) * h(9-MacArthur_Numeric) 0.00144843
Selected 17 of 30 terms, and 12 of 37 predictors
Termination condition: RSq changed by less than 0.001 at 30 terms
Importance: Political_PartyDemocrat, Age, MacArthur_Numeric, RaceWhite, ...
Number of terms at each degree of interaction: 1 4 12
GCV 0.3849115 RSS 2135.989 GRSq 0.02805093 RSq 0.04181287 CVRSq 0.007724538
Note: the cross-validation sd's below are standard deviations across folds
Cross validation: nterms 20.20 sd 3.19 nvars 12.70 sd 1.77
CVRSq sd MaxErr sd
0.008 0.019 2.47 0.106
Show the code
summary(mars_model) %>% .$coefficients %>% head(10) Cancer_Avoidance_Mean
(Intercept) 1.518538780
Political_PartyDemocrat -0.123549198
h(MacArthur_Numeric-9) 0.917380167
h(56-Age) 0.008995609
h(Age-26)*h(9-MacArthur_Numeric) 0.001448431
RaceWhite 0.184075261
Job_ClassificationProfessional*h(9-MacArthur_Numeric) -0.023894133
h(Income-2)*RaceWhite -0.038750430
EthnicityNo, not of Hispanic, Latino, or Spanish origin*h(MacArthur_Numeric-9) -0.956873797
Job_ClassificationBlue Collar*h(9-MacArthur_Numeric) 0.037855571
Health condition Model
Show the code
# need to drop NA to get accuracy
health_condition_data <- selectdata %>%
drop_na(
Cancer_Avoidance_Mean, Stressful_Events_Recent, Current_Depression, Anxiety_Severity_num, PTSD5_Score,
Health_Depression_Severity_num, Stress_TotalScore
)
# split into training and testing sets
set.seed(123)
train_idx <- sample(seq_len(nrow(health_condition_data)), size = 0.7 * nrow(health_condition_data))
train <- health_condition_data[train_idx, ]
test <- health_condition_data[-train_idx, ]
# Fit MARS model
mars_model_health_condition <- earth(
Cancer_Avoidance_Mean ~ Stressful_Events_Recent + Current_Depression + Anxiety_Severity_num +
PTSD5_Score + Health_Depression_Severity_num + Stress_TotalScore,
data = train
)
# Predict on test set
pred <- predict(mars_model_health_condition, newdata = test)
summary(mars_model_health_condition)Call: earth(formula=Cancer_Avoidance_Mean~Stressful_Events_Recent+Curr...),
data=train)
coefficients
(Intercept) 1.76031430
h(Anxiety_Severity_num-3) 0.14374629
h(1-PTSD5_Score) -0.08783213
Selected 3 of 5 terms, and 2 of 7 predictors
Termination condition: RSq changed by less than 0.001 at 5 terms
Importance: Anxiety_Severity_num, PTSD5_Score, ...
Number of terms at each degree of interaction: 1 2 (additive model)
GCV 0.4286923 RSS 1994.279 GRSq 0.007299509 RSq 0.00900262
Show the code
# Variable importance
evimp(mars_model_health_condition) nsubsets gcv rss
Anxiety_Severity_num 2 100.0 100.0
PTSD5_Score 1 61.6 63.4
Cross validation on Health Condition
Show the code
library(earth)
mars_model_health_condition <- earth(
Cancer_Avoidance_Mean ~ Stressful_Events_Recent + Current_Depression + Anxiety_Severity_num +
PTSD5_Score + Health_Depression_Severity_num + Stress_TotalScore,
data = train,
degree = 2, # allow up to 2-way interactions
nk = 100,
nfold = 10, # 10-fold CV
keepxy = TRUE
)
summary(mars_model_health_condition)Call: earth(formula=Cancer_Avoidance_Mean~Stressful_Events_Recent+Curr...),
data=train, keepxy=TRUE, degree=2, nfold=10, nk=100)
coefficients
(Intercept) 1.76029852
h(Anxiety_Severity_num-3) 0.19063475
h(1-PTSD5_Score) -0.08778390
h(Anxiety_Severity_num-3) * h(Stress_TotalScore-10) 0.15364202
h(Anxiety_Severity_num-3) * h(Stress_TotalScore-5) -0.06007696
Selected 5 of 10 terms, and 3 of 7 predictors
Termination condition: RSq changed by less than 0.001 at 10 terms
Importance: Anxiety_Severity_num, PTSD5_Score, Stress_TotalScore, ...
Number of terms at each degree of interaction: 1 2 2
GCV 0.4286946 RSS 1989.154 GRSq 0.007294166 RSq 0.01154922 CVRSq -0.003463359
Note: the cross-validation sd's below are standard deviations across folds
Cross validation: nterms 6.00 sd 1.63 nvars 3.10 sd 0.57
CVRSq sd MaxErr sd
-0.003 0.012 2.35 0.0772
Show the code
pred <- predict(mars_model_health_condition, newdata = test)
rmse <- sqrt(mean((pred - test$Cancer_Avoidance_Mean)^2))
cor(pred, test$Cancer_Avoidance_Mean) [,1]
Cancer_Avoidance_Mean 0.1452411
Logistic Regression
MacArthur Scale vs cancer-avoidance score
Show the code
MacArthur_cancer_logistic <- glm(Cancer_Avoiders01 ~ MacArthur_Numeric, data = demo_data, family = binomial)
summary(MacArthur_cancer_logistic)
Call:
glm(formula = Cancer_Avoiders01 ~ MacArthur_Numeric, family = binomial,
data = demo_data)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.60016 0.16150 -16.100 < 2e-16 ***
MacArthur_Numeric -0.10846 0.03165 -3.427 0.000611 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 2785.6 on 8044 degrees of freedom
Residual deviance: 2773.8 on 8043 degrees of freedom
AIC: 2777.8
Number of Fisher Scoring iterations: 6
Facebook usage vs cancer-avoidance score
Show the code
facebook_cancer_logistic <- glm(Cancer_Avoiders01 ~ Facebook_Usage_cat, data = media_data, family = binomial)
summary(facebook_cancer_logistic)
Call:
glm(formula = Cancer_Avoiders01 ~ Facebook_Usage_cat, family = binomial,
data = media_data)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.1041 0.0870 -35.680 <2e-16 ***
Facebook_Usage_cat1 0.2358 0.1141 2.067 0.0388 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 2654.9 on 6823 degrees of freedom
Residual deviance: 2650.6 on 6822 degrees of freedom
AIC: 2654.6
Number of Fisher Scoring iterations: 5
Stress score vs cancer-avoidance score
Show the code
stress_cancer_logistic <- glm(Cancer_Avoiders01 ~ Stress_TotalScore, data = health_condition_data, family = binomial)
summary(stress_cancer_logistic)
Call:
glm(formula = Cancer_Avoiders01 ~ Stress_TotalScore, family = binomial,
data = health_condition_data)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.00919 0.09944 -30.260 <2e-16 ***
Stress_TotalScore 0.01475 0.01937 0.761 0.447
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 2638.3 on 6659 degrees of freedom
Residual deviance: 2637.8 on 6658 degrees of freedom
AIC: 2641.8
Number of Fisher Scoring iterations: 5
Smoking vs cancer-avoidance score
Show the code
smoking_cancer_logistic <- glm(Cancer_Avoiders01 ~ Cigarette_Smoking_num, data = health_behavior_data, family = binomial)
summary(smoking_cancer_logistic)
Call:
glm(formula = Cancer_Avoiders01 ~ Cigarette_Smoking_num, family = binomial,
data = health_behavior_data)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.10042 0.06452 -48.050 < 2e-16 ***
Cigarette_Smoking_num1 0.62694 0.12839 4.883 1.05e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 2709.8 on 6956 degrees of freedom
Residual deviance: 2688.1 on 6955 degrees of freedom
AIC: 2692.1
Number of Fisher Scoring iterations: 6
Climate Change vs cancer-avoidance score
Show the code
climate_cancer_logistic <- glm(Cancer_Avoiders01 ~ Climate_Change_Belief, data = other_data, family = binomial)
summary(climate_cancer_logistic)
Call:
glm(formula = Cancer_Avoiders01 ~ Climate_Change_Belief, family = binomial,
data = other_data)
Coefficients:
Estimate
(Intercept) -2.0239
Climate_Change_BeliefSomewhat skeptical about the impact of human activities on climate change, believing that climate change is a natural cycle. -0.4822
Climate_Change_BeliefUncertain about the causes and extent of climate change. -0.6842
Climate_Change_BeliefNo opinion on the matter. -0.1903
Climate_Change_BeliefSomewhat believe climate change is occurring and is influenced by human activities, but natural factors also play a significant role. -1.0698
Climate_Change_BeliefStrongly believe climate change is occurring and is primarily caused by human activities. -1.2358
Std. Error
(Intercept) 0.1749
Climate_Change_BeliefSomewhat skeptical about the impact of human activities on climate change, believing that climate change is a natural cycle. 0.2480
Climate_Change_BeliefUncertain about the causes and extent of climate change. 0.2707
Climate_Change_BeliefNo opinion on the matter. 0.3405
Climate_Change_BeliefSomewhat believe climate change is occurring and is influenced by human activities, but natural factors also play a significant role. 0.2064
Climate_Change_BeliefStrongly believe climate change is occurring and is primarily caused by human activities. 0.1939
z value
(Intercept) -11.570
Climate_Change_BeliefSomewhat skeptical about the impact of human activities on climate change, believing that climate change is a natural cycle. -1.945
Climate_Change_BeliefUncertain about the causes and extent of climate change. -2.528
Climate_Change_BeliefNo opinion on the matter. -0.559
Climate_Change_BeliefSomewhat believe climate change is occurring and is influenced by human activities, but natural factors also play a significant role. -5.182
Climate_Change_BeliefStrongly believe climate change is occurring and is primarily caused by human activities. -6.372
Pr(>|z|)
(Intercept) < 2e-16
Climate_Change_BeliefSomewhat skeptical about the impact of human activities on climate change, believing that climate change is a natural cycle. 0.0518
Climate_Change_BeliefUncertain about the causes and extent of climate change. 0.0115
Climate_Change_BeliefNo opinion on the matter. 0.5762
Climate_Change_BeliefSomewhat believe climate change is occurring and is influenced by human activities, but natural factors also play a significant role. 2.19e-07
Climate_Change_BeliefStrongly believe climate change is occurring and is primarily caused by human activities. 1.87e-10
(Intercept) ***
Climate_Change_BeliefSomewhat skeptical about the impact of human activities on climate change, believing that climate change is a natural cycle. .
Climate_Change_BeliefUncertain about the causes and extent of climate change. *
Climate_Change_BeliefNo opinion on the matter.
Climate_Change_BeliefSomewhat believe climate change is occurring and is influenced by human activities, but natural factors also play a significant role. ***
Climate_Change_BeliefStrongly believe climate change is occurring and is primarily caused by human activities. ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 2781.4 on 7320 degrees of freedom
Residual deviance: 2731.3 on 7315 degrees of freedom
AIC: 2743.3
Number of Fisher Scoring iterations: 6



































