Solving SQL Interview Query for Data Analyst asked by a Product based company

Solving SQL Interview Query for Data Analyst asked by a Product based company

techTFQ

1 год назад

496,970 Просмотров

Ссылки и html тэги не поддерживаются


Комментарии:

Praveen VK
Praveen VK - 30.10.2023 21:53

First of all bad code. If u put 1 instead of null, that is enough to get count. Why going for bad solutions.

Ответить
jaguara01
jaguara01 - 24.10.2023 13:55

this works for me:
SELECT customer_id,
customer_name,
sum(billing_amount) AS total_billing_amt,
count(billing_id) as billing_cnt,
3-count(distinct year) as missing_billing_cnt,
sum(billing_amount)/(count(billing_id)+3-count(distinct year)) as avg

FROM (select *
from df
where year >= 2019 and year <= 2021)
group by customer_id,customer_name

Ответить
jese
jese - 21.10.2023 22:57

I would walk out of the interview, even if I knew the answer.

Ответить
fenix
fenix - 21.10.2023 21:25

Thank to your lessons I was able to solve this query with joins in MySQL

with cte as (select distinct b1.customer_id,b1.customer_name, year from billing b1
cross join (select 2019 year union select 2020 union select 2021) k)

select cte.customer_id, cte.customer_name,concat(round(avg(coalesce(b.billed_amount,0)),2),'$') av_billing_amount
from cte left join billing b
on b.customer_id=cte.customer_id and year(b.billing_creation_date)=cte.year
group by cte.customer_id,cte.customer_name;

Ответить
T S
T S - 18.10.2023 21:36

here is what the sql should look like - careless and fully parametrised (same code size for any interval in years).

Sometimes when we are on an interview we are tested on can we spot inconsistencies too.



-- fully parametrised-- to not duplicate constants in the code
declare @BY as int = 2019;
declare @EY as int = 2021;

with cte as (
select customer_id, customer_name, datediff(YYYY,billing_creation_date,'') YO
, sum( convert(float,billed_amount)) BA_AVG_YEAR
from billing B
where year(B.billing_creation_date) between @BY and @EY
group by customer_id, customer_name, datediff(YYYY,billing_creation_date,'')
)
--select * from cte;

select customer_id, customer_name, round(sum( BA_AVG_YEAR),2)/3 as Average_AnnualAmt
from CTE group by customer_id, customer_name

/*
customer_id customer_name Average_AnnualAmt
----------- ------------- ----------------------
1 A 75
2 B 200
3 C 183.33
*/

Ответить
T S
T S - 18.10.2023 21:35

Is it only me or no one else is noticing that the calculations in the task do not match? the amount for B is 600/3 = 200 not 150.

the first row must be divided by 3 not 4 - there are only 3 years in the interval not 4 .
C is the only right answer.

Ответить
KirkHammered
KirkHammered - 13.10.2023 21:59

WITH t1 as (
SELECT customer_id || customer_name as customer_hash
, SUM(billing_amount) as total_billing_amount
, COUNT(*) as billing_session_cnt
FROM billing
WHERE EXTRACT(YEAR FROM billing_creating_date) BETWEEN 2019 AND 2021
GROUP BY 1
),
t2 as (
SELECT customer_id || customer_name as customer_hash
, COUNT(DISTINCT(EXTRACT(YEAR FROM billing_creating_date))) as unique_year_cnt
FROM billing
WHERE EXTRACT(YEAR FROM billing_creating_date) BETWEEN 2019 AND 2021
GROUP BY 1
)
SELECT t1.customer_hash - - can use a regex function to split this back into customer_id and customer_name
, ROUND(
total_billing_amount::numeric /
(billing_session_cnt + (3 - unique_year_cnt))
, 2) AS avg_billing_amount_$
FROM t1 JOIN t2 ON t1.customer_hash = t2.customer_hash

*To make the query more modular we can use EXTRACT(YEAR FROM CURRENT_DATE) or use a variable.

Ответить
Miguel Escalante Milke
Miguel Escalante Milke - 05.10.2023 02:21

Nice!! I’ve never seen someone explain SQL problems for interviews and gotta admit I love the format and the way you explain it. Insta-subscribed🎉.

I’ve been practicing in codewars but I always feel like they’re either too simple tasks asking for a JOIN or something I just have never thought before.
This really helped me improving my problem solving skills and the way I tackle SQL problems

Ответить
Rodrigo Cardoso
Rodrigo Cardoso - 03.10.2023 18:33

WITH
TAB AS
(SELECT 1 AS customer_id, 'A' AS customer_name, 'id1' AS billing_id,PARSE_DATE('%d-%m-%Y','10-10-2020') AS billing_creation_date, 100 AS billed_amount
UNION ALL SELECT 1, 'A', 'id2',PARSE_DATE('%d-%m-%Y','11-11-2020'), 150
UNION ALL SELECT 1, 'A', 'id3',PARSE_DATE('%d-%m-%Y','12-11-2021'), 100
UNION ALL SELECT 2, 'B', 'id4',PARSE_DATE('%d-%m-%Y','10-11-2019'), 150
UNION ALL SELECT 2, 'B', 'id5',PARSE_DATE('%d-%m-%Y','11-11-2020'), 200
UNION ALL SELECT 2, 'B', 'id6',PARSE_DATE('%d-%m-%Y','12-11-2021'), 250
UNION ALL SELECT 3, 'C', 'id7',PARSE_DATE('%d-%m-%Y','01-01-2018'), 100
UNION ALL SELECT 3, 'C', 'id8',PARSE_DATE('%d-%m-%Y','05-01-2019'), 250
UNION ALL SELECT 3, 'C', 'id9',PARSE_DATE('%d-%m-%Y','06-01-2021'), 300),

ANOS AS (SELECT * FROM UNNEST([2019,2020,2021]) AS ANO),
CUSTOMERS AS (SELECT DISTINCT TAB.CUSTOMER_ID, TAB.customer_name FROM TAB),
ANO_CUSTOMERS AS (SELECT * FROM CUSTOMERS CROSS JOIN ANOS),
ANO_TAB AS (SELECT *, EXTRACT(YEAR FROM TAB.billing_creation_date) AS ANO from TAB)

SELECT
ANO_CUSTOMERS.CUSTOMER_ID,
ANO_CUSTOMERS.customer_name,
ROUND(SUM(COALESCE(ANO_TAB.billed_amount,0))/COUNT(ANO_CUSTOMERS.customer_id),2) AS AVG_BILLING
FROM
ANO_CUSTOMERS
LEFT JOIN
ANO_TAB
ON
ANO_CUSTOMERS.ANO = ANO_TAB.ANO AND
ANO_CUSTOMERS.CUSTOMER_ID = ANO_TAB.customer_id
GROUP BY
ANO_CUSTOMERS.customer_id, ANO_CUSTOMERS.customer_name

Ответить
This is karthik
This is karthik - 03.10.2023 10:08

Don't know why are you complicating this code.
1) avg() billing amount
2) use WHERE class BETWEEN
3) GROUP BY ID
That's all i think

Ответить
Bipin Murari
Bipin Murari - 03.10.2023 09:28

Hi @techTFQ.
In the above video,I have a doubt.
Here the date range is only from 2019 to 2021.
Suppose we have a data range from 2009 to 2020.
I would like to know if this solution works if we have the above range?
Please let me know.

Ответить
Invented
Invented - 29.09.2023 08:21

pls give me remote job…i know sql server….no payment…first you give some project related to sql server…then if you satisfied then you pay…

Ответить
Stay Raw!
Stay Raw! - 28.09.2023 14:40

Unnecessary complex approach in all your videos

Ответить
Vibe of the ART 🎨
Vibe of the ART 🎨 - 27.09.2023 12:39

SELECT
CONCAT(customer_name, ' ',
IFNULL(FORMAT(SUM(CASE WHEN YEAR(billing_creation_date) BETWEEN 2019 AND 2021 THEN billed_amount ELSE 0 END) /
SUM(CASE WHEN YEAR(billing_creation_date) BETWEEN 2019 AND 2021 THEN 1 ELSE 0 END), 2), '0.00'), '$') AS avg_billing_amount
FROM billing
GROUP BY customer_name
ORDER BY customer_name;

Ответить
Vibe of the ART 🎨
Vibe of the ART 🎨 - 27.09.2023 12:12

SELECT customer_id,customer_name,
COALESCE(AVG(CASE WHEN EXTRACT(YEAR FROM billing_creation_date) BETWEEN 2019 AND 2021 THEN billed_amount ELSE 0 END), 0) AS avg_billing_amount
FROM billing
GROUP BY customer_id, customer_name
ORDER BY customer_id; we can do it like this in simple ways Why CTE option you choose ?????

Ответить
Lallie Hayes
Lallie Hayes - 25.09.2023 16:05

If this were a business problem for a real company, it seems like this solution leaves out some assumptions: What if customers weren't in the database for the entire period? What if there were outlier billing amounts? Could SQL IF be used to tighten up some of this variability?

Ответить
d
d - 24.09.2023 00:18

really bad video showing the wrong output and errors....

Ответить
Sascha Götz
Sascha Götz - 22.09.2023 16:55

Honestly, this is a terrible solution. This is my approach: Also 20 lines of sql code (excluding generation of data in the first CTE), but easily expandalbe to infintive number of years.

--mssql syntax
with data as
(
select 1 as customer_id,'A' as customer_name,'id1' as billing_id,convert(date,'2020-10-10') as billing_creation_date,100 as billed_amount union all
select 1 ,'A','id2',convert(date,'2020-11-11'),150 union all
select 1 ,'A','id3',convert(date,'2021-11-12'),100 union all
select 2 ,'B','id4',convert(date,'2019-11-10'),150 union all
select 2 ,'B','id5',convert(date,'2020-11-11'),200 union all
select 2 ,'B','id6',convert(date,'2021-11-12'),250 union all
select 3 ,'C','id7',convert(date,'2018-01-01'),100 union all
select 3 ,'C','id8',convert(date,'2019-01-05'),250 union all
select 3 ,'C','id9',convert(date,'2021-01-06') ,300
),
allyears as (select distinct year(billing_creation_date) as [year] from data where year(billing_creation_date) between 2019 and 2021),
allcustomers as (select distinct customer_ID,customer_name FROM data),
allyearsandcustomers as (
select c.customer_id
,c.customer_name
,y.year
from allcustomers c
full outer join allyears y on 1=1
),
allyearsandcustomerswithdata as
(
select allyearsandcustomers.customer_id
,allyearsandcustomers.customer_name
,convert(decimal(11,2),avg(isnull(d.billed_amount,0))) as billed_amount
from allyearsandcustomers
left outer join data d on d.customer_id = allyearsandcustomers.customer_id and year(d.billing_creation_date) = allyearsandcustomers.year
group by allyearsandcustomers.customer_id
,allyearsandcustomers.customer_name
)
select *
from allyearsandcustomerswithdata

Ответить
ACE King
ACE King - 21.09.2023 06:00

Hi Taufiq bro, can I have your mail id.

Ответить