Skip to content

2024

Atlan

Atlan logo

My Journey to Atlan: An Interview Experience

this is post the OA and the screening round.

Take Home Assignment

The unique hiring process of the Atlan, validates its candidates on the technical expertise and understanding of the systems and not just the DSA like FAANG. I was given the task of designing and building a highly scalable logistics platform for goods transportation. This platform would allow users to book transportation services, connecting them with a fleet of drivers and offering real-time availability, pricing, and vehicle tracking, over a week of time.

After a week of submission recieved a mail, we loved your solution and moving forward with your applicaiton.

Gitmatch

Discover the Best Open-Source Projects with GitMatch

Are you a developer seeking to dive into the world of open-source projects but overwhelmed by the countless options available? Look no further! GitMatch is here to help you discover the best projects tailored to your interests and skills.

Why Use GitMatch?

Open-source projects offer a fantastic opportunity to learn new skills, contribute to the community, and showcase your talents to potential employers. However, finding projects that align with your interests and expertise can be challenging. GitMatch simplifies this process by providing personalized project suggestions based on your preferences.

Bpetokenizer

A Byte Pair Encoding (BPE) tokenizer, which algorithmically follows along the GPT tokenizer(tiktoken), allows you to train your own tokenizer. The tokenizer is capable of handling special tokens and uses a customizable regex pattern for tokenization(includes the gpt4 regex pattern). supports save and load tokenizers in the json format. The bpetokenizer also supports pretrained tokenizers.

Overview

The Byte Pair Encoding (BPE) algorithm is a simple yet powerful method for building a vocabulary of subword units for a given text corpus. This tokenizer can be used for training your tokenizer of the LLM on various languages of text corpus.

this algorithm is first introduced in the paper Neural Machine Translation of Rare Words with Subword Units and then used this in the gpt2 tokenizer(Language Models are Unsupervised Multitask Learners)

Every LLM(LLama, Gemini, Mistral..) use their own Tokenizers trained on their own text dataset.

How to Make the Best Out of Open-Source LLMs

Ever found yourself wondering if you could tap into the magic of top-notch language models without breaking the bank? Well, good news – open-source LLMs like Gemma, Mistral, Phi2, and a bunch of others are here to save the day! And guess what? They won't cost you a single penny. If you're scratching your head about how to get them up and running on your own machine and use them for all sorts of cool stuff, you're in the right place. Let's dive in!

Why Bother with Open-Source LLMs?

Okay, so GPT-4 is like the rockstar of AI models, but let's be real – it's got a pretty steep price tag attached. And for us students, coughing up a bunch of cash for a project just isn't in the cards. But fear not, my friend, because here's where open-source LLMs come strutting in to save the day. They're like the friendly neighborhood superheroes of the AI world, and here's why they're totally awesome:

SSH - Secure Shell

Introduction

Secure Shell, is a cryptographic network protocol that allows secure communication between two systems over an unsecured network. It provides a secure channel for data transmission, remote login, and other network services. SSH operates on the client-server model, where the SSH server runs on the remote system, and the SSH client is used to access it from a local system.

AAAhhh its just the typical definition again. lets understand the concept of SSH to get this definition into our head..

Getting Cozy with SSH

Ever wondered how we ensure our data stays safe while transfering our data from your machine to other through the internet, i know it passes through many lanes to reach the other machine. there's a catch here, anyone in the network can access your data before reaching the reciever machine and can manipulate the data.

alright lets picturize this..

RAG

What interested me in doing this...

I was really interested in AI, which is literally the most important tool to enhance everyone's life and solve real-time problems that require significant human resources. By replacing them with AI, we can address all these issues.

I became interested in building projects around the APIs of models, GPTs, and OSS models. When I was a beginner, it seemed really cool to generate new content from foundational models. However, when I asked real-time questions or something outside the model's training data, I often received default answers like, "I'm only trained on data up to 2022." alt text

This made me question: what are the ways I can train the foundational model on real-time data and make it available for users? This curiosity led me to delve into fine-tuning, which involves training the foundational model on your private data.

Lets talk about LLMs

My experience with LLMs

I've explored a variety of Large Language Model(LLMs), ranging from the commercial GPT-4, gemini-pro to open source alternatives like Mistral-7B and GPT-3.5-turbo. Additionally, I've also explored the smaller language models, like phi-2 of Microsoft which is also an open source model which is trained on just 2.7B params.

Amongst these, gpt4 stands out for its efficiency and the response generation. as per my knowledge, I think there's no llm that has matched the quality of GPT-4 responses but the information is limited to 2022.

I recommend using the gemini-pro for the more updated informtaion and faster response compared to GPT-4.

What are LLMs

Large Language Models, are AI models trained on large text datasets which include articles, books, and other texts.

Docker 🐳

In this blog, we'll delve into why Docker is a crucial tool for modern software development and how to effectively utilize it. We'll cover Docker from the basics to advanced usage scenarios, exploring its benefits and practical applications.

Problem Statement

Consider a common scenario: a developer is working on a Django application on their local machine. To run the application successfully, certain requirements and dependencies need to be installed, such as:

Django 4.0
Python 3.1.0
dev meme

dev2: its not working?

dev1: but it works on my computer


The application runs smoothly on the developer's local environment. However, as time passes, the developer pushes the application code into a remote repository. Months or years later, another developer wants to clone the repository and use the application. Here lies the main problem:

Problem