top of page
Search


Document Chunking for RAG systems and, by extension, Generative AI applications
The maximum sequence length of Language Models, while not a hard-coded limitation, is a parameter that significantly impacts their performance. Consequently, document chunking has become one of the critical operations directly affecting the performance of RAG systems and, by extension, Generative AI applications. The chunking approaches we commonly see in practical applications—those that simply split at maximum sequence length—often produce suboptimal results. Why? Because t
Oktay Sahinoglu
11 hours ago2 min read
Â
Â
Â


Lightning Fast Smart Sampling: Pick Unique Items from Large Sets at High Speed with Exclusion Support in Python
A tool that can take as many non-repetitive samples from a large set as we want, with the ability to exclude arbitrarily chosen samples, is needed in almost every area of data science. For example, it can be used in a numerical dataset, but it is also very useful in natural language processing cases (such as selecting paragraphs from a corpus). Well, numpy already has a random choice tool, what kind of contribution are we talking about here? If you need to repeat this selecti
Oktay Sahinoglu
Jul 132 min read
Â
Â
Â


LSTM Modelling for Text & Time Series Data (Part 2)
(Text) Recurrent Neural Networks (RNN) are among the best options for sequential data, such as text or time series. LSTM (Long Short-Term...
Oktay Sahinoglu
Oct 8, 20207 min read
Â
Â
Â


LSTM Modelling for Text & Time Series Data (Part 1)
(Time Series) Recurrent Neural Networks (RNN) are among the best options for sequential data, such as text or time series. LSTM (Long...
Oktay Sahinoglu
Oct 3, 20206 min read
Â
Â
Â
bottom of page
