Panoply ig data challenge submission by ka

#1

Hi Panoply Community and Judges,

My name is Ka and thank you very much for setting up this challenge.

For the purpose of this challenge, I set up the ETL using Panoply (as data warehouse) and Mode (for exploration and visualizations) to explore the @shinestythreads Instagram data and get insights.

Below are my findings:

  • Finding engagement for each hashtag used:
    – average likes, comments per hashtag
    with t AS
    (SELECT
    value as hashtag,
    likes_count as likes,
    comments_count as comments
    FROM public.shinestyinstagram_instagram_media m
    left JOIN public.shinestyinstagram_instagram_media_tags mt
    ON m.id = mt.instagram_media_id
    )

select
hashtag,
AVG(likes) as avg_likes,
AVG(comments) as avg_comments
from
t
where hashtag is not null
group by 1

This visual tells you the average likes for each hashtag, #housetonstrong and #theperfectcrime have the most average likes performance.

This visual tells you the average comments with respects to each hashtag used, #buttstuff and #macrobrews have the best performance.


This visual combine the above twos.

  • Find out performance over time:
    – find out likes and comments over time by week
    SELECT
    DATE_TRUNC(‘week’, created_time)::DATE as week,
    SUM(comments_count) AS total_comments,
    AVG(comments_count) AS avg_comments,
    SUM(likes_count) AS total_likes,
    AVG(likes_count) AS avg_likes,
    count(distinct id) as nums_of_post
    FROM
    public.shinestyinstagram_instagram_media
    GROUP BY
    1
    ORDER BY 1

  • Finding performance by day of week using when a user comment:
    – find out performance by day of week
    select TO_CHAR(created_time, ‘DY’) as day,
    COUNT(distinct media_id) AS nums_of_post_got_commented,
    COUNT(distinct from_username) AS nums_of_commenter,
    ROUND((nums_of_commenter/ cast(nums_of_post_got_commented as FLOAT)), 0) as average_commenter_per_post
    from public.shinestyinstagram_instagram_comments
    group by 1
    order by 1

Below visual shows that Thursday and Friday are the time when the users like to comment the most.

  • Finding performance by hour of day using when a user made a comment:
    – find out performance by hour of day
    select TO_CHAR(created_time, ‘HH24’) as hour,
    COUNT(distinct media_id) AS nums_of_post_got_commented,
    COUNT(distinct from_username) AS nums_of_commenter,
    ROUND((nums_of_commenter/ cast(nums_of_post_got_commented as FLOAT)), 0) as average_commenter_per_post
    from public.shinestyinstagram_instagram_comments
    group by 1
    order by 1

Below visual shows that the users are most active during 11pm to 2am of a day.

  • Finding performance by hour of day using when the post is created:
    – find out avg comments and likes by hour
    SELECT
    TO_CHAR(created_time, ‘HH24’) as hour,
    SUM(comments_count) AS total_comments,
    AVG(comments_count) AS avg_comments,
    SUM(likes_count) AS total_likes,
    AVG(likes_count) AS avg_likes,
    count(distinct id) as nums_of_post
    FROM
    public.shinestyinstagram_instagram_media
    GROUP BY
    1
    ORDER BY 1

The purpose of me showing this query is that it is not appropriate to use this insight to make a recommendation to post a IG post during the 7am to 4pm of a day. Because this query is using when a post is created to calculate the numbers of likes and comments. In contrast, the last visual used when a user has commented to calculate the performance, which I think would be more accurate. So I would like to suggest @shinestythreads to post during 11pm to 2am of a day to get more comment engagement. Unfortunately the API didn’t provide the same info for likes, which I would like to use to calculate for like engagement.

  • Finding performance by types of media:
    – performance by types of media
    SELECT
    type,
    SUM(likes_count) as total_likes,
    AVG(likes_count) as avg_likes,
    SUM(comments_count) as total_comments,
    AVG(comments_count) as avg_comments,
    COUNT(distinct id) as nums_of_post
    FROM
    public.shinestyinstagram_instagram_media
    GROUP BY
    1

There are three types of medias that IG offers right now: image, video and carousel. Below visuals show the video media has the most average comments and the carousel media has the most average likes. And the video media shows itself is the best media to get engagement.

  • Finding performance for different media by filter:
    – performance by types, filter of media
    SELECT
    filter,
    type,
    SUM(likes_count) as total_likes,
    AVG(likes_count) as avg_likes,
    SUM(comments_count) as total_comments,
    AVG(comments_count) as avg_comments,
    COUNT(distinct id) as nums_of_post
    FROM
    public.shinestyinstagram_instagram_media
    GROUP BY
    1, 2

In this visual, we can see the filter named Crema has the best performance beside the Normal filter for image media.


In this visual, we can see the filter named Ashby has the best performance beside the Normal filter for video media.

For carousel media, we can do the same to get insight but I didn’t do it in here because @shinestythreads only has normal filter for carousel media.

  • Find out performance by location tag used:
    – find out performance by location tag
    SELECT
    location,
    SUM(likes_count) as total_likes,
    AVG(likes_count) as avg_likes,
    SUM(comments_count) as total_comments,
    AVG(comments_count) as avg_comments
    FROM
    (SELECT
    name as location,
    m.likes_count,
    m.comments_count
    FROM
    public.shinestyinstagram_instagram_media_location l
    LEFT JOIN public.shinestyinstagram_instagram_media m
    ON l.instagram_media_id = m.id
    ) as t
    GROUP BY
    1

This visual shows us that among all the posts that has a location tag, Augusts National Golf Club has the most average likes and comments engagement.

  • Find out performance difference for posts that has location tag versus there is no tag:
    – find out performance for location tag vs no location tag

WITH t AS
(SELECT
m.id,
m.likes_count,
m.comments_count,
l.name as location
FROM
public.shinestyinstagram_instagram_media m
LEFT JOIN
public.shinestyinstagram_instagram_media_location l
ON
m.id = l.instagram_media_id
),

w as
(SELECT
*,
(CASE WHEN location IS NULL THEN 0 ELSE 1 END) AS have_location
FROM t
)

SELECT
have_location,
SUM(likes_count) as total_likes,
AVG(likes_count) as avg_likes,
SUM(comments_count) as total_comments,
AVG(comments_count) as avg_comments
FROM
w
GROUP BY
1

Posts with location tag has more average likes but slightly less average comments compare to posts that has no location tag.
36%20PM

  • Find out top active commenters:

– find out top active commenters
SELECT
*
FROM
(SELECT
from_username as username,
COUNT(media_id) as nums_of_comments,
RANK() OVER(ORDER BY nums_of_comments DESC)
FROM
public.shinestyinstagram_instagram_comments
GROUP BY
1
ORDER BY
2 DESC
) as t
WHERE
rank >1 and rank <=15

This visual shows us who are the top active commenters (users), not including the user @shinestythreads.
24%20PM

  • Cohort Analysis:
    – cohort analysis
    with t AS
    (select
    media_id,
    from_username as username,
    DATE_TRUNC(‘week’, created_time) as week
    from
    public.shinestyinstagram_instagram_comments
    ),

w AS
(select
username,
min(week) as first_time_commenting
from
t
GROUP by
1
)

SELECT
x.cohort::DATE AS week,
MAX(x.week_number) OVER (PARTITION BY x.cohort) AS total_nums_of_week,
x.week_number,
MAX(x.nums_of_commenter) OVER (PARTITION BY x.cohort) AS nums_of_new_commenter,
x.nums_of_commenter,
x.nums_of_commenter/MAX(x.nums_of_commenter) OVER (PARTITION BY x.cohort)::FLOAT AS retention_rate
FROM
(SELECT
w.first_time_commenting as cohort,
FLOOR(EXTRACT(‘day’ FROM t.week - w.first_time_commenting)/7) AS week_number,
COUNT(DISTINCT t.username) AS nums_of_commenter
FROM
t t
LEFT JOIN
w w
ON
t.username = w.username
GROUP BY
1,2) as x
ORDER BY 1,2,3

This query shows us the trend of commenters separated by cohort date. It tells you the retention rate for your active commenters. We could do the same for Likes if we have the info, and I suspect doing cohort analysis using Likes info would be much better for the retention rate because I think it is easier to get a like than a comment from users.
38%20PM

Above are all the analyzes I have done for this challenge. I hope you may find they are useful. If you have any question, please feel free to comment and reach out. Thanks!

Ka
https://www.linkedin.com/in/kahousio (feel free to connect me!)

1 Like