PromptsVault AI is thinking...
Searching the best prompts from our community
Searching the best prompts from our community
Prompts matching the #data-preprocessing tag
Master feature engineering and data preprocessing techniques for improved machine learning model performance. Data quality assessment: 1. Missing data analysis: missing completely at random (MCAR), missing at random (MAR), patterns identification. 2. Outlier detection: IQR method (Q1-1.5*IQR, Q3+1.5*IQR), Z-score (>3 standard deviations), isolation forest. 3. Data distribution: normality tests, skewness detection, transformation requirements. Feature transformation: 1. Numerical features: standardization (mean=0, std=1), min-max scaling [0,1], robust scaling for outliers. 2. Categorical features: one-hot encoding (cardinality <10), label encoding (ordinal), target encoding. 3. Text features: TF-IDF vectorization, word embeddings, n-gram features (1-3 grams). Advanced feature engineering: 1. Polynomial features: interaction terms, feature combinations, degree 2-3 maximum. 2. Temporal features: time-based features (hour, day, month), lag features, rolling statistics. 3. Domain-specific: geographical features (distance, coordinates), financial ratios, business metrics. Feature selection: 1. Statistical methods: chi-square test, correlation analysis (>0.8 correlation removal). 2. Model-based: feature importance from tree models, L1 regularization (Lasso). 3. Wrapper methods: recursive feature elimination, forward/backward selection. Dimensionality reduction: 1. PCA: variance retention 95%, principal component analysis, linear transformation. 2. t-SNE: non-linear visualization, perplexity tuning, high-dimensional data exploration. Validation: cross-validation for feature selection, target leakage prevention, temporal data splitting for time series.