top of page
Search


High-Performance 2D/3D Visualization of Massive Datasets
Data Science isn't just about finding ready-made applications online and trying to adapt them to your needs, of course. What makes the difference is performing exploratory data analysis, applying statistical tests, and extracting insights. Data visualization is an integral part of this exploration. There are many libraries available for data visualization, but what if your dataset contains millions of samples? Ever had graphics start loading slowly, and navigating within them
Oktay Sahinoglu
Oct 262 min read
Â
Â
Â


Document Chunking for RAG systems and, by extension, Generative AI applications
The maximum sequence length of Language Models, while not a hard-coded limitation, is a parameter that significantly impacts their performance. Consequently, document chunking has become one of the critical operations directly affecting the performance of RAG systems and, by extension, Generative AI applications. The chunking approaches we commonly see in practical applications—those that simply split at maximum sequence length—often produce suboptimal results. Why? Because t
Oktay Sahinoglu
Oct 232 min read
Â
Â
Â


Lightning-Fast Python with Numba (Smart Sampling with Exclusion)
A tool that can take as many non-repetitive samples from a large set as we want, with the ability to exclude arbitrarily chosen samples, is needed in almost every area of data science. For example, it can be used in a numerical dataset, but it is also very useful in natural language processing cases (such as selecting paragraphs from a corpus). Well, numpy already has a random choice tool, what kind of contribution are we talking about here? If you need to repeat this selecti
Oktay Sahinoglu
Jul 132 min read
Â
Â
Â


LSTM Modelling for Text & Time Series Data (Part 2)
(Text) Recurrent Neural Networks (RNN) are among the best options for sequential data, such as text or time series. LSTM (Long Short-Term...
Oktay Sahinoglu
Oct 8, 20207 min read
Â
Â
Â
bottom of page
