Pytubedata

A simple YouTube Data API wrapper written in Python

I wanted to contribute to open-source projects for so long. But the whole process of finding the right project and understanding the whole codebase to finally start writing some code, I wasn't getting anywhere with it. So I started my own.

Was that the only reason? No.

The Beginning

I did many courses about Data Science and all of them preached about novel algorithms that can tease out insights from data. But not even a single subject addressed the fundamental question. How to get that data?

Those of you who are currently learning these algorithms must have at some point felt trapped with the old, boring, and clean datasets. I did, enough to drop machine learning and first get myself some original, dirty data.

First, some research

So I started exploring different sources to get some original data and finally chose YouTube. Why? It's an ocean of data with millions of rivers ensuring it never goes dry. When you want some data always look for an API first. It's the cleanest way to get it.

Having some prior experience with Python's requests and navigating through Google API documentation, I was ready to become an open-source contributor.

Know the terms:

  • Endpoints - An endpoint in an API call tells the server what data we want. For eg. in YouTube's API, channel endpoint returns data about a channel

  • requests - Requests is a Python library used to make HTTP requests.

I can't wait to write some code

First things first, what is an API?

API or Application Program Interface is an intermediary program that allows two applications to talk. In our case, the client (our program) will be talking to the server (YouTube) asking it for data.

How is the communication taking place?

It's happening through the HTTP protocol. Remember the requests library, that's what it does, make HTTP requests. HTTP is a request-response protocol, in which the client (our program) sends a request to the server (YouTube) and the server sends back a response.

How to make HTTP requests in Python?

There exist different types of HTTP requests. To get data we will be making get requests. To make a get request you first need a cople of things-

  • URL: Where to send the request. In an API call, the URL is accompanied by an endpoint.

  • Parameters: Additional information provided with the request to filter the results.

Both of these are always mentioned in the API documentation(s). To make a get request in Python:

import requests

requests.request(method="GET", url=_url, params=_params)

A Software Engineer and a Programmer

Those two lines above are the crux of Pytubedata, or for that matter any application that makes API calls. But still, it's hard to understand the code of those applications, I mean that's why I wrote this whole damn thing myself.

But only those two lines don't make up an application. What makes up an application is a well-structured, modular code that's easy to scale, maintain, and reuse. This what set apart a Software Engineer from a Programmer. A Programmer's code can get the task done, but a Software Engineer's code will get the job done.

Ensuring the three (scale, maintain, reuse) somewhat compromises with the easy-to-understand for beginners part. Here by beginner I mean someone new to the source code.

How Pytubedata is different?

My primary reason for building Pytubedata was to build a module that would not only be intuitive to users but also be easy to understand for new developers, all that while maintaining scalability, maintainability and reusability factors best I can. And I think I did a pretty good job at that. How?

My code is, what many may call, highly documented, the way I explicitly mention the data type of each variable. It helps to keep track of the flow of data, IntellSence works like a charm and I feel like I have more control over my variables (which is not quite the case with Python though).

That said, there are some things I realized over the course of building Pytubedata. Some things just cannot be understood without prior knowledge. Think of them as protocols, you are not supposed to understand protocols but to know them. And as the application grows, it becomes obvious to use those protocols. I have used some of them myself and I will be explaining them in a series of blogs.

Head to the Newsletter section from the top bar and subscribe to not miss those blogs.

See you in the next one!