Pylar allows you to work with Local Dataframes—data that’s already loaded in your Pylar session. This enables you to query and join multiple dataframes using SQL, creating unified views without needing to connect to external databases.
Think of dataframes as temporary tables that exist in your current Pylar session. They’re perfect for combining data from multiple sources or working with intermediate results.
SELECT user_id, event_type, COUNT(*) as event_countFROM table0 WHERE event_type IN ('login', 'logout', 'purchase') AND timestamp >= '2024-01-01'GROUP BY user_id, event_typeORDER BY event_count DESC;
/* Joining on a common ID field between two tables */SELECT t1.event_id, t1.event_type, t1.timestamp, t2.user_id, t2.user_name, t2.user_emailFROM table1 t1JOIN table2 t2 ON t1.event_id = t2.event_id;
This query:
Joins table1 and table2 dataframes
Matches rows where event_id values are equal
Returns selected columns from both dataframes
Uses table aliases (t1, t2) for clarity
Always use table aliases when joining dataframes. It makes your queries more readable and helps avoid column name conflicts.
SELECT -- Event data e.event_id, e.event_type, e.timestamp, e.event_data, -- User profile data u.user_id, u.user_name, u.user_email, u.signup_date, u.plan_type, -- Calculated fields DATEDIFF(day, u.signup_date, e.timestamp) AS days_since_signupFROM table0 e -- events dataframeJOIN table1 u -- users dataframe ON e.user_id = u.user_idWHERE e.event_type IN ('purchase', 'subscription_upgrade') AND e.timestamp >= '2024-01-01'ORDER BY e.timestamp DESC;
SELECT -- Product catalog p.product_id, p.product_name, p.category, p.price, p.launch_date, -- Sales data s.total_units_sold, s.total_revenue, s.avg_order_quantity, s.last_sale_date, -- Review data r.avg_rating, r.review_count, r.positive_review_pct, -- Performance metrics s.total_revenue / NULLIF(DATEDIFF(day, p.launch_date, CURRENT_DATE()), 0) AS daily_revenue_rate, s.total_units_sold * r.avg_rating AS popularity_scoreFROM table0 p -- products dataframeLEFT JOIN ( SELECT product_id, SUM(quantity) AS total_units_sold, SUM(amount) AS total_revenue, AVG(quantity) AS avg_order_quantity, MAX(order_date) AS last_sale_date FROM table1 -- sales dataframe GROUP BY product_id) s ON p.product_id = s.product_idLEFT JOIN ( SELECT product_id, AVG(rating) AS avg_rating, COUNT(*) AS review_count, SUM(CASE WHEN rating >= 4 THEN 1 ELSE 0 END) * 100.0 / COUNT(*) AS positive_review_pct FROM table2 -- reviews dataframe GROUP BY product_id) r ON p.product_id = r.product_idWHERE p.is_active = trueORDER BY popularity_score DESC NULLS LAST;