Inprogresses

Deep Space Astrophotography^[draft]

Bortle Scale The Bortle Scale is a way to measure the quality of the night sky. It ranges from 1 to 9, with 1 being the darkest sky and 9 being the most light polluted. To plan my dark sky trips, I look at the light pollution map to find a rental property with a low number on the Bortle Scale (Bortle 4 and under is ideal). Technical Terms Aperture: The diameter of the lens or mirror that collects light. The larger the aperture, the more light the telescope can gather, and the fainter the objects it can see. For Astrophotography, a larger aperture is generally better. ...

Distributed Training

Recommended Readings: https://huggingface.co/docs/transformers/v4.15.0/en/parallelism Mofified version of https://colossalai.org/docs/concepts/paradigms_of_parallelism/ Introduction With the development of deep learning, there is an increasing demand for parallel training. This is because that model and datasets are getting larger and larger and training time becomes a nightmare if we stick to single-GPU training. In this section, we will provide a brief overview of existing methods to parallelize training. Data Parallel Data parallel is the most common form of parallelism due to its simplicity. In data parallel training, the dataset is split into several shards, each shard is allocated to a device. This is equivalent to parallelize the training process along the batch dimension. Each device will hold a full copy of the model replica and trains on the dataset shard allocated. After back-propagation, the gradients of the model will be all-reduced so that the model parameters on different devices can stay synchronized. ...

System Design^[draft]

Lets design a system

Quantization

Quantization methods for deep learning models

CUDA^[draft]

GPUs go brrrr… Resources Practice exercises and lecture by NVIDIA and OLCF https://www.olcf.ornl.gov/cuda-training-series/ (https://www.youtube.com/playlist?app=desktop&list=PL6RdenZrxrw-zNX7uuGppWETdxt_JxdMj) https://github.com/olcf/cuda-training-series/tree/master/exercises CUDA Mode Discord https://github.com/cuda-mode/lectures NVIDIA CUDA C Programming Guide https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#compute-capability Book Programming Massively Parallel Processors: A Hands-on Approach by David B. Kirk and Wen-mei W. Hwu 4th Edition CUDA Toolkit documentation https://docs.nvidia.com/cuda/index.html Streaming Multiprocessor (SM) It is similar to CPU core HW. The threads of a thread block execute concurrently on one multiprocessor. As thread blocks terminate, new blocks are launched on the vacated multiprocessors. ...

Linear Algebra^[draft]

Linear and Affine Functions If $f$ is a linear function, $$f(\alpha x + \beta y) = \alpha f(x) + \beta f(y)$$ The above is equivalent to saying: $$f(ax) = af(x)$$ $$f(x+y) = f(x) + f(y)$$ where $\alpha, \beta \in \mathbb{R}$ and $x, y \in \mathbb{R}^n$. If $f$ is an affine function, there is an additional constrain that $\alpha + \beta = 1$. To prove that any function $f$ satisfying the above properties is linear, we will show that $f$ can be represented as $f(x) = mx$, where $m$ is a constant, and that it satisfies the conditions for a linear function over the real numbers. ...

Allegories for machine learning^[draft]

Allegories in literature and life

F*ck DSA :)^[draft]

Common techniques to solve DSA problems

Random concepts for ML^[draft]

Evaluating the model Recall (R) $R = \frac{TP}{TP + FN}$ i.e. number of true values “recalled” out of total true values Precision (P) $P = \frac{TP}{TP + FP}$ i.e. How precise you were in identifying positive cases. Covid RTPCR Test has high precision and low recall. This means if the test result is Covid positive, there is a high chance that it is correct but if it says negative, nothing can said with certainty. ...

Reinforcement Learning Tips and Tricks

Reinforcement Learning can do wonders but selecting the best algorithm and then tuning it’s hyper-parameters is not an easy task. This blog post aims at making the process easier by listing out a few general suggestions and a list of hyper-parameters to tune for each algorithm.