PromptsVault AI is thinking...
Searching the best prompts from our community
Searching the best prompts from our community
Prompts matching the #data-science tag
Build production churn prediction system. Pipeline: 1. Perform exploratory data analysis and visualization. 2. Engineer features (RFM, engagement scores, usage patterns). 3. Handle class imbalance with SMOTE or class weights. 4. Train multiple models (XGBoost, Random Forest, Neural Network). 5. Implement cross-validation and hyperparameter tuning. 6. Create SHAP values for model interpretability. 7. Build prediction API with FastAPI. 8. Set up monitoring for model drift. Include feature importance analysis and business impact metrics.
Optimize Pandas data processing pipeline. Techniques: 1. Vectorize operations (avoid loops). 2. Use appropriate data types (int8, category). 3. Process large datasets with chunking. 4. Parallelize processing with Dask or Swifter. 5. Efficient file formats (Parquet/Feather). 6. Memory usage profiling. 7. Index optimization for merging. 8. Caching intermediate results. Include benchmark comparisons.
Build an interactive real-time analytics dashboard. Tech stack: 1. Use Plotly Dash for the web framework. 2. Implement WebSocket connections for live data streaming. 3. Create responsive charts (time series, heatmaps, scatter plots). 4. Add filtering and date range selectors. 5. Implement data aggregation with Pandas for performance. 6. Use Redis for caching frequently accessed metrics. 7. Add export functionality (CSV, PDF reports). 8. Implement role-based access control. Include dark mode toggle and mobile responsiveness.
Create advanced features for a churn prediction model. Techniques: 1. Temporal features (days since last purchase, purchase frequency). 2. Aggregations (total spend, average order value). 3. Categorical encoding (one-hot, target encoding). 4. Interaction features (tenure × monthly charges). 5. Feature selection using mutual information and correlation analysis. Document feature importance and business rationale for each engineered feature.
Leverage big data for research insights using appropriate methods. Data characteristics: 1. Volume: large datasets requiring distributed computing. 2. Velocity: real-time or near real-time data streams. 3. Variety: structured and unstructured data from multiple sources. 4. Veracity: data quality and reliability concerns. Analytics approaches: 1. Machine learning: supervised (prediction) vs. unsupervised (pattern discovery). 2. Natural language processing: sentiment analysis, topic modeling, named entity recognition. 3. Network analysis: social networks, collaboration patterns, information flow. 4. Time series analysis: trend detection, forecasting, anomaly detection. Tools and platforms: 1. R/Python for analysis, Spark for distributed computing. 2. Cloud platforms: AWS, Google Cloud, Azure for scalable processing. 3. Visualization: Tableau, D3.js for interactive dashboards. Validation: 1. Cross-validation for machine learning models. 2. Triangulation with traditional data sources. 3. Replication across independent datasets. Ethical considerations: consent for secondary use, privacy protection, algorithmic bias.