*** BuzzWords。嗡嗡嗡。
*** 數據領域。Data。範圍。AIOps。MLOps。DataOps。
1. AIOps。與如何高效地處理擷取儲存文字相關。(NLP With High-Performance)
- 有 LLM。Semantic Cache。VectorDB。
2. 商務與使用體驗有三大指標。Hit Ratio。Latency。Recall。
- Hit Ratio。This metric quantifies the cache's ability to fulfill content requests successfully, compared to the total number of requests it receives. A higher hit ratio indicates a more effective cache.
- Latency。This metric measures the time it takes for a query to be processed and the corresponding data to be retrieved from the cache. Lower latency signifies a more efficient and responsive caching system.
- Recall。This metric represents the proportion of queries served by the cache out of the total number of queries that should have been served by the cache. Higher recall percentages indicate that the cache is effectively serving the appropriate content.
3. Cache 有三種方法。SampleCodes 見下。
- In-Memory Cache often causes cost increases.
- DB Cache usually data formats mismatch will cause the cache miss.
- Semantic Cache stores prompts and responses, and evaluate hits based on semantic similarity.
*** 後端開發。Development。範圍。DevOps。GitOps。CIOps。
- 早期 Jenkins。Github。Azure Devops。Docker。
- 近期 CircleCI。Gitlab。Kubernetes (Helm ft yq ft Kustomize)。
*** 網絡端點。NetOps。範圍。NGFW。SDWAN。SSE。Gateway。Endpoint。DNS。MTTR。
*** 基礎建設。InfraOps。範圍。Above-Mentioned。
*** 參考。
[2] Redis。
- Semantic Cache。 https://redis.io/docs/latest/integrate/redisvl/user-guide/semantic-caching。
- Vector Search。https://redis.io/solutions/vector-search/。
- Vector Database。https://redis.io/docs/get-started/vector-database/。
[3] OpenAI。
[4] LLM。LangChainCaching。Sample Code。[4.1] github.com/langchainai/langchain/blob/v0.0.219/langchain/cache.py。
*** Data。AIOps。MLOps。DataOps。
AIOps stands for Artificial Intelligence for IT Operations. It involves the application of AI and machine learning techniques to enhance IT operations, including monitoring, event correlation, and automation, to improve efficiency and performance.- Large Language Models, Redis Semantic Cache, and Vector Databases are interconnected in the realm of natural language processing applications, where efficient data storage, retrieval, and processing are essential for achieving high-performance results.
- Redis Semantic Cache can optimize the performance of both LLMs and Vector Databases by caching frequently accessed data or intermediate results, while Vector Databases provide a scalable and efficient storage solution for vector data used by LLMs in NLP tasks.
- Algorithm Visualization refers to the graphical representation of algorithms to aid in understanding their behavior and performance. It helps developers and data scientists analyze and optimize algorithms.
*** Redis Semantic Cache
Cost Reduce & APP Throughput Increase
Real-Time Results Using Vector Search in Redis Enterprise |
Vector Databases, Embeddings, Indexing, Distance Metrics, and Large Language Models |
- Vector databases are specialized systems that efficiently store and retrieve dense numerical vectors, designed for tasks like image recognition and recommendation systems, using techniques like hierarchical navigable small world (HNSW) and product quantization.
The vector embedding is stored as a JSON array. (It shown as: "description_embeddings": [ , , , ...] |
- Vector Embeddings are numerical representations of unstructured data like audio or images, capturing semantic similarity by mapping objects to points in a vector space where similar objects have close vectors.
- Search (where results are ranked by relevance to a query string)
- Clustering (where text strings are grouped by similarity)
- Recommendations (where items with related text strings are recommended)
- Anomaly detection (where outliers with little relatedness are identified)
- Diversity measurement (where similarity distributions are analyzed)
- Classification (where text strings are classified by their most similar label)
- Vector indexing is a method of organizing and retrieving data based on vector representations, replacing traditional tabular or document formats with vectors in a multi-dimensional space.
Redis Enterprise manages vectors in an index data structure to enable intelligent similarity search that balances search speed and search quality. Choose from two popular techniques, FLAT (a brute force approach) and HNSW (a faster, and approximate approach), based on your data and use cases.
- Distance metrics are mathematical functions determining the similarity or dissimilarity between two vectors, crucial for tasks like classification and clustering, with Redis using three measures for enhanced performance.
Redis Enterprise uses a distance metric to measure the similarity between two vectors. Choose from three popular metrics – Euclidean, Inner Product, and Cosine Similarity – used to calculate how “close” or “far apart” two vectors are.
- Large language models (LLMs) are powerful deep-learning models designed for language processing, using large-scale transformer architectures to comprehend and generate text, showcasing impressive capabilities in various applications.
---
title: Redis Semantic Cache
author: Celia
---
# Cache
## Semantic Cache in Redis 7.2
title: Redis Semantic Cache
author: Celia
---
# Cache
## Semantic Cache in Redis 7.2
### Pre-install LLM
``` shell
!pip install langchain openai --quiet --upgrade
!pip install langchain openai --quiet --upgrade
```
### ChatOpenAI instance
```python
import langchain
```python
import langchain
from langchain.chat_models import ChatOpenAI
llm = ChatOpenAI()
```
### 1. InMemoryCache
```python
from langchain.cache import InMemoryCache
langchain.llm_cache = InMemoryCache()
```
```python
# Ask a question and measure how long it takes for LLM to respond.
%%time
llm.predict("What is OpenAI?")
# Output:
CPU times: user 25 ms, sys: 6.4 ms, total: 31.4 ms
Wall time: 4.54 s
```
#### How InMemoryCache stores data?
```python
class InMemoryCache(BaseCache):
"""Cache that stores things in memory."""
def __init__(self) -> None:
"""Initialize with empty cache."""
self._cache: Dict[Tuple[str, str], RETURN_VAL_TYPE] = {}
```
"""Cache that stores things in memory."""
def __init__(self) -> None:
"""Initialize with empty cache."""
self._cache: Dict[Tuple[str, str], RETURN_VAL_TYPE] = {}
```
```python
# First element of the tuple
list(langchain.llm_cache._cache.keys())[0][0]
# Output 1:
'[{"lc": 1, "type": "constructor", "id": ["langchain", "schema", "HumanMessage"], "kwargs": {"content": "What is OpenAI?"}}]'
```
```python
# Second element of the tuple
list(langchain.llm_cache._cache.keys())[0][1]
'[{"lc": 1, "type": "constructor", "id": ["langchain", "schema", "HumanMessage"], "kwargs": {"content": "What is OpenAI?"}}]'
```
```python
# Second element of the tuple
list(langchain.llm_cache._cache.keys())[0][1]
# Output2 :
'{"lc": 1, "type": "constructor", "id": ["langchain", "chat_models", "openai", "ChatOpenAI"], "kwargs": {"openai_api_key": {"lc": 1, "type": "secret", "id": ["OPENAI_API_KEY"]}}}---[(\'stop\', None)]'
```
### 2. FullLLMCache
```python
!rm -f .cache.db
```python
!rm -f .cache.db
from langchain.cache import SQLiteCache
langchain.llm_cache = SQLiteCache(database_path=".cache.db")
# Output 1:
CPU times: user 4.25 ms, sys: 980 µs, total: 5.23 ms
Wall time: 4.97 ms
# Ask the same question twice and measure the performance difference
%%time
llm.predict("What is OpenAI?")
```
```python
%%time
llm.predict("What is OpenAI?")
# Output 2:
CPU times: user 39.3 ms, sys: 9.16 ms, total: 48.5 ms
Wall time: 4.84 s
```
```python
# Add some space in the sentence and ask again
%%time
llm.predict("What is OpenAI?")
# Output 3:
The extra spaces cause the cache miss.
```
#### How FullLLMCache stores data?
```python
class FullLLMCache(Base):
"""SQLite table for full LLM Cache (all generations)."""
__tablename__ = "full_llm_cache"
prompt = Column(String, primary_key=True)
llm = Column(String, primary_key=True)
idx = Column(Integer, primary_key=True)
response = Column(String)
```python
class FullLLMCache(Base):
"""SQLite table for full LLM Cache (all generations)."""
__tablename__ = "full_llm_cache"
prompt = Column(String, primary_key=True)
llm = Column(String, primary_key=True)
idx = Column(Integer, primary_key=True)
response = Column(String)
class SQLAlchemyCache(BaseCache):
"""Cache that uses SQAlchemy as a backend."""
def __init__(self, engine: Engine, cache_schema: Type[FullLLMCache] = FullLLMCache):
"""Initialize by creating all tables."""
self.engine = engine
self.cache_schema = cache_schema
self.cache_schema.metadata.create_all(self.engine)
```
"""Cache that uses SQAlchemy as a backend."""
def __init__(self, engine: Engine, cache_schema: Type[FullLLMCache] = FullLLMCache):
"""Initialize by creating all tables."""
self.engine = engine
self.cache_schema = cache_schema
self.cache_schema.metadata.create_all(self.engine)
```
```python
with engine.connect() as connection:
rs = connection.exec_driver_sql('select * from full_llm_cache')
print(rs.keys())
for row in rs:
print(row)
# Output:
RMKeyView(['prompt', 'llm', 'idx', 'response'])
('[{"lc": 1, "type": "constructor", "id": ["langchain", "schema", "HumanMessage"], "kwargs": {"content": "What is OpenAI?"}}]', '{"lc": 1, "type": "constructor", "id": ["langchain", "chat_models", "openai", "ChatOpenAI"], "kwargs": {"openai_api_key": {"lc": 1, "type": "secret", "id": ["OPENAI_API_KEY"]}}}---[(\'stop\', None)]', 0, '{"lc": 1, "type": "constructor", "id": ["langchain", "schema", "ChatGeneration"], "kwargs": {"message": {"lc": 1, "type": "constructor", "id": ["lang ... (588 characters truncated) ... AI models and systems, such as the language model GPT-3, to showcase the capabilities and potential applications of AI.", "additional_kwargs": {}}}}}')
('[{"lc": 1, "type": "constructor", "id": ["langchain", "schema", "HumanMessage"], "kwargs": {"content": "What is OpenAI?"}}]', '{"lc": 1, "type": "constructor", "id": ["langchain", "chat_models", "openai", "ChatOpenAI"], "kwargs": {"openai_api_key": {"lc": 1, "type": "secret", "id": ["OPENAI_API_KEY"]}}}---[(\'stop\', None)]', 0, '{"lc": 1, "type": "constructor", "id": ["langchain", "schema", "ChatGeneration"], "kwargs": {"message": {"lc": 1, "type": "constructor", "id": ["lang ... (594 characters truncated) ... maintains various open-source AI tools and frameworks to facilitate the development and deployment of AI applications.", "additional_kwargs": {}}}}}')
```
### 3. SemanticCache
```python
!pip install langchain openai --quiet --upgrade
!pip install langchain openai --quiet --upgrade
import os
os.environ['OPENAI_API_KEY'] = 'your openai api key'
# https://platform.openai.com/api-keys
```
```python
!curl -fsSL https://packages.redis.io/redis-stack/redis-stack-server-6.2.6-v7.focal.x86_64.tar.gz -o redis-stack-server.tar.gz
!tar -xvf redis-stack-server.tar.gz
!pip install redis
!./redis-stack-server-6.2.6-v7/bin/redis-stack-server --daemonize yes
```
```python
import langchain
from langchain.llms import OpenAI
# To make the caching really obvious, lets use a slower model.
llm = OpenAI(model_name="text-davinci-002", n=2, best_of=2)
```
```python
# Initialize the Redis semantic cache with default score threshold 0.2
from langchain.embeddings import OpenAIEmbeddings
from langchain.cache import RedisSemanticCache
langchain.llm_cache = RedisSemanticCache(redis_url="redis://localhost:6379", embedding=OpenAIEmbeddings(), score_threshold=0.2)
```
```python
%%time
llm("Please translate 'this is Monday' into Chinese")
# Output 1:
CPU times: user 74.4 ms, sys: 7.11 ms, total: 81.5 ms
Wall time: 2.19 s
'\n\n这是周一'
```
```python
%%time
llm("Please translate 'this is Tuesday' into Chinese")
# Output 2:
Notice that, the query below is 1 word different from the previous one. Cache got same hit.
CPU times: user 6.35 ms, sys: 0 ns, total: 6.35 ms
Wall time: 211 ms
'\n\n这是周一'
```
CPU times: user 6.35 ms, sys: 0 ns, total: 6.35 ms
Wall time: 211 ms
'\n\n这是周一'
```
```python
%%time
llm("Tell me a joke")
# Output 3:
CPU times: user 34.2 ms, sys: 2.85 ms, total: 37 ms
Wall time: 3.88 s
'\n\nWhy did the chicken cross the road?\n\nTo get to the other side.'
```
CPU times: user 34.2 ms, sys: 2.85 ms, total: 37 ms
Wall time: 3.88 s
'\n\nWhy did the chicken cross the road?\n\nTo get to the other side.'
```
```python
%%time
llm("Tell me 2 jokes")
# Output 4:
CPU times: user 7.27 ms, sys: 0 ns, total: 7.27 ms
Wall time: 247 ms
'\n\nWhy did the chicken cross the road?\n\nTo get to the other side.'
```
#### How SemanticCache stores data?
```python
# Redis semantic cache
# Find the keys in the cache
# Redis semantic cache
# Find the keys in the cache
langchain.llm_cache._cache_dict
# Output: {'cache:bf6f6d9ebdf492e28cb8bf4878a4b951': <langchain.vectorstores.redis.Redis at 0x7fed7bd13310>}
```
*** Development
- DevOps is a culture, set of practices, and collaboration between development and operations teams aimed at automating and improving the process of software development, testing, and deployment.
- GitOps is a methodology for managing infrastructure and applications using Git version control. Changes to the infrastructure or applications are made through pull requests and automatically applied through continuous integration/continuous deployment pipelines.
- CIOps, or Continuous Integration Operations, is the practice of integrating operations processes into the continuous integration and continuous deployment (CI/CD) pipeline. It aims to automate and streamline the deployment and management of infrastructure and applications.
- 早期 Jenkins。Github。Azure Devops。Docker。
- 近期 CircleCI。Gitlab。Kubernetes (Helm ft yq ft Kustomize)
*** Security
- SecOps, or Security Operations, is the practice of integrating security into the DevOps process. It involves continuous monitoring, detection, and response to security threats and vulnerabilities.
- SOC stands for Security Operations Center, a centralized unit responsible for monitoring and analyzing an organization's security posture, detecting and responding to security incidents, and implementing security measures.
- CISO stands for Chief Information Security Officer, a senior executive responsible for overseeing an organization's information security strategy and ensuring compliance with security policies and regulations.
- SASE stands for Secure Access Service Edge, a cloud-based architecture that combines network security functions with wide-area networking capabilities to support the dynamic secure access needs of modern enterprises.
- SIEM stands for Security Information and Event Management, a technology that provides real-time analysis of security alerts generated by network hardware and applications to identify and respond to security threats.
- SOAR stands for Security Orchestration, Automation, and Response, a technology stack that integrates security tools and automates incident response processes to improve the efficiency and effectiveness of security operations.
- EDR stands for Endpoint Detection and Response, a security technology that continuously monitors and analyzes endpoint activities to detect and respond to advanced threats and security incidents.
- XDR stands for Extended Detection and Response, a security platform that correlates and analyzes data from multiple security products across different security layers to provide comprehensive threat detection and response capabilities.
*** Network
- NetOps, or Network Operations, is the practice of managing and maintaining an organization's network infrastructure to ensure its availability, performance, and security.
- NGFW stands for Next-Generation Firewall, a network security device that combines traditional firewall capabilities with advanced features such as intrusion prevention, application control, and deep packet inspection.
- SDWAN stands for Software-Defined Wide Area Network, a technology that enables the centralized management and dynamic allocation of network resources to optimize the performance and security of wide area networks.
- SSE could refer to various things, but in the context of NetOps, it might stand for Server-Sent Events, a technology used for real-time communication between a client and a server over HTTP.
- A gateway is a network node that connects two different networks, facilitating communication between them.
- An endpoint refers to a computing device connected to a network, such as a desktop computer, laptop, smartphone, or IoT device.
- DNS stands for Domain Name System, a decentralized naming system for computers, services, or any resource connected to the Internet or a private network, translating domain names into IP addresses.
- MTTR stands for Mean Time to Repair, a metric used to measure the average time it takes to repair a system or component after a failure or incident.
*** Infrasturature
- InfraOps, or Infrastructure Operations, refers to the management and maintenance of an organization's IT infrastructure, including hardware, software, networks, and data centers.
ロックンロール ウィドウ。
*** 十年數🌲木。百年數🌲人。
- #DRE | #1。to-the-basic。原理。
- #DRE | #2。at-a-glance。縱覽。
- #DRE | #3。Ops ...Oops。搶佔先機。
- #DRE | #4。GO。八面玲瓏。
- #DRE | #5。DiDaDi。加速。
- #DRE | #6。Z + Editor ➡️ ZED。1+1大於2。
- #DRE | #7。OSS。開源。