데이터 분석/SQL

QCC 5회차

경 민 2025. 5. 2. 14:26

1번

  • 제출한 쿼리 (오답)
with notdis as (
  select count(1) cnt
from calls
where 
  category = 'n/a'
  or category is null
  )

select 
  round(((select cnt from notdis) / count(1)) *100, 1) as uncategorised_call_pct
from calls
where date_format(call_date, '%Y%m%d') <= '20240415'

 

→ 틀린 부분 수정) with 문에서도 call_date가 2024년 4월 15일 이전이라는 조건을 넣어줘야 한다.

with notdis as (
  select count(1) cnt
from calls
where 
 (category = 'n/a'
  or category is null)
 and date_format(call_date, '%Y%m%d') <= '20240415'
  )

select 
  round(((select cnt from notdis) / count(1)) *100, 1) as uncategorised_call_pct
from calls
where date_format(call_date, '%Y%m%d') <= '20240415'

⬇︎

  • 다른 방법으로 풀어본 쿼리
select
  round(sum(case when category = 'n/a' or category is null then 1 else 0 end) 
  / count(1) * 100, 1) uncategorised_call_pct
from calls
where date_format(call_date, '%Y%m%d') <= '20240415'

2번

  • 제출한 쿼리 (정답)
with order_cnt as (
  select age_bucket, count(1) cnt_order
  from app_events ae
  join user_profiles up on ae.user_id = up.user_id
  where event_type = 'order'
  and date_format(signup_date, '%Y%m%d') >= '20230101'
  group by up.age_bucket
),
  view_cnt as (
  select age_bucket, count(1) cnt_view
  from app_events ae
  join user_profiles up on ae.user_id = up.user_id
  where event_type = 'view'
  and date_format(signup_date, '%Y%m%d') >= '20230101'
  group by up.age_bucket
)

select
  oc.age_bucket,
  round((oc.cnt_order / (oc.cnt_order + vc.cnt_view))*100 , 2) conversion_rate
from order_cnt oc
join view_cnt vc on oc.age_bucket = vc.age_bucket
order by 1

 

맞긴 맞았지만 비효율적이다.

코드카타 풀면서 조건 충족하면 1, 아니면 0 으로 반환하는 case when 구문 공부 많이 했는데

문제 풀면서 생각해내지 못했다.

1번에서 이 방법으로 다시 푼 것처럼 여기에도 적용해보았다.

이번엔 if문으로 !

⬇︎

select 
  up.age_bucket,
  round((sum(if(ae.event_type = 'order',1,0)) / count(1)) * 100
  , 2) conversion_rate
from user_profiles up
join app_events ae on up.user_id = ae.user_id
where 
  up.signup_date >= '2023-01-01'
  and ae.event_type != 'review'
group by 1
order by 1

 

 ^__^

초 깔 끔


3번

  • 제출한 쿼리 (오답)
-- 우수고객
with great_customer as (
  select user_id
  from 
    (select 
        user_id,
        count(1) cnt
    from user_orders
    group by user_id) a
    where cnt >= 10),
  final_df as (
  select user_id, order_datetime
  from
 (select
    u.user_id, u.order_datetime,
    row_number() over(partition by u.user_id order by u.order_datetime) rn
  from user_orders u
  join great_customer g on u.user_id = g.user_id) b
  where rn in (1,10)
  )
  
select *
from 
(select 
  user_id,
  datediff(order_datetime_10,order_datetime) days_to_power_user
from
(select
  user_id,
  order_datetime,
  lead(order_datetime,1) over(partition by user_id) order_datetime_10
from final_df) c) d
where days_to_power_user is not null

 

문제가 최종적으로 요구하는 게 무엇인지를 놓쳤다.

내가 구한 건 

우수 고객의 첫 번째 구매일자와 10번째 구매일자 간의 차이이다.

나 진짜 왜 이렇게 덜렁거릴까 

이런 실수를 몇 번이나 하는거야...

order by days_to_power_user limit 1

 

마지막에 이것만 추가하면 된다..

⬇︎

  • 더 깔끔하게, 효율적으로 수정
-- 우수고객
with numbering as (
  select
      *,
      row_number() over(partition by u.user_id order by u.order_datetime) rn,
      lead(order_datetime,9) over(partition by user_id) order_datetime_10
  from user_orders u
  )

select 
  user_id,
  datediff(order_datetime_10, order_datetime) days_to_power_user
from numbering
where 
  rn = 1
  and order_datetime_10 is not null -- 우수 고객 필터링
order by 2 limit 1 -- 가장 기간 짧은 고객 필터링

⬇︎

  • 더더 깔끔하게
select
  user_id,
  datediff(tenth_orderdate, order_datetime) days_to_power_user
from 
    (select
      *,
      lead(order_datetime,9) over(partition by user_id order by order_datetime) tenth_orderdate
    from user_orders) a
group by user_id
having count(1) >= 10 -- 우수 고객 필터링
order by 2 limit 1 -- 가장 기간 짧은 고객 필터링

'데이터 분석 > SQL' 카테고리의 다른 글

QCC 6회차  (0) 2025.05.26
QCC 4회차  (0) 2025.04.11
mysql 정규표현식, substring_index  (0) 2025.04.06
QCC 3회차  (0) 2025.04.04
[day24] window 함수 - 비율 함수  (0) 2025.03.11