PromptsVault AI is thinking...
Searching the best prompts from our community
Searching the best prompts from our community
Prompts matching the #feature-engineering tag
Create advanced features for a churn prediction model. Techniques: 1. Temporal features (days since last purchase, purchase frequency). 2. Aggregations (total spend, average order value). 3. Categorical encoding (one-hot, target encoding). 4. Interaction features (tenure × monthly charges). 5. Feature selection using mutual information and correlation analysis. Document feature importance and business rationale for each engineered feature.
Master feature engineering and data preprocessing techniques for improved machine learning model performance. Data quality assessment: 1. Missing data analysis: missing completely at random (MCAR), missing at random (MAR), patterns identification. 2. Outlier detection: IQR method (Q1-1.5*IQR, Q3+1.5*IQR), Z-score (>3 standard deviations), isolation forest. 3. Data distribution: normality tests, skewness detection, transformation requirements. Feature transformation: 1. Numerical features: standardization (mean=0, std=1), min-max scaling [0,1], robust scaling for outliers. 2. Categorical features: one-hot encoding (cardinality <10), label encoding (ordinal), target encoding. 3. Text features: TF-IDF vectorization, word embeddings, n-gram features (1-3 grams). Advanced feature engineering: 1. Polynomial features: interaction terms, feature combinations, degree 2-3 maximum. 2. Temporal features: time-based features (hour, day, month), lag features, rolling statistics. 3. Domain-specific: geographical features (distance, coordinates), financial ratios, business metrics. Feature selection: 1. Statistical methods: chi-square test, correlation analysis (>0.8 correlation removal). 2. Model-based: feature importance from tree models, L1 regularization (Lasso). 3. Wrapper methods: recursive feature elimination, forward/backward selection. Dimensionality reduction: 1. PCA: variance retention 95%, principal component analysis, linear transformation. 2. t-SNE: non-linear visualization, perplexity tuning, high-dimensional data exploration. Validation: cross-validation for feature selection, target leakage prevention, temporal data splitting for time series.