apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: itlanson-ingress
namespace: default
spec:
rules:
- host: itlanson.com
http:
paths:
- path: /
pathType: Prefix
backend: ## 指定需要響應的后端服務
service:
name: my-nginx-svc ## kubernetes集群的svc名稱
port:
number: 80 ## service的端口號
ingress 規則會生效到所有按照了 IngressController 的機器的 nginx 配置。
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: itlanson-ingress
namespace: default
spec:
defaultBackend: ## 指定所有未匹配的默認后端
service:
name: php-apache
port:
number: 80
rules:
- host: itlanson.com
http:
paths:
- path: /abc
pathType: Prefix
backend:
service:
name: my-nginx-svc
port:
number: 80
效果
itlanson.com 下的 非 /abc 開頭的所有請求,都會到 defaultBackend
非 itlanson.com 域名下的所有請求,也會到 defaultBackend
nginx 的全局配置
kubectl edit cm ingress-nginx-controller -n ingress-nginx
編輯配置加上
data: 配置項: 配置值 所有配置項參考
https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/configmap基于環境變量帶去的
Rewrite - NGINX Ingress Controller
Rewrite 功能,經常被用于前后分離的場景
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
annotations: ## 寫好annotion
#https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/annotations/
nginx.ingress.kubernetes.io/rewrite-target: /$2 ### 只保留哪一部分
name: rewrite-ingress-02
namespace: default
spec:
rules: ## 寫好規則
- host: itlanson.com
http:
paths:
- backend:
service:
name: php-apache
port:
number: 80
path: /api(/|$)(.*)
pathType: Prefix
TLS/HTTPS - NGINX Ingress Controller
生成證書:(也可以去青云申請免費證書進行配置)
$ openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout ${KEY_FILE:tls.key} -out ${CERT_FILE:tls.cert} -subj "/CN=${HOST:itlanson.com}/O=${HOST:itlanson.com}"
kubectl create secret tls ${CERT_NAME:itlanson-tls} --key ${KEY_FILE:tls.key} --cert ${CERT_FILE:tls.cert}
## 示例命令如下
openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout tls.key -out tls.cert -subj "/CN=it666.com/O=it666.com"
kubectl create secret tls it666-tls --key tls.key --cert tls.cert
apiVersion: v1
data:
tls.crt: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURJekNDQWd1Z0F3SUJBZ0lKQVB6YXVMQ1ZjdlVKTUEwR0NTcUdTSWIzRFFFQkN3VUFNQ2d4RWpBUUJnTlYKQkFNTUNXbDBOalkyTG1OdmJURVNNQkFHQTFVRUNnd0phWFEyTmpZdVkyOXRNQjRYRFRJeE1EVXhNREV5TURZdwpNRm9YRFRJeU1EVXhNREV5TURZd01Gb3dLREVTTUJBR0ExVUVBd3dKYVhRMk5qWXVZMjl0TVJJd0VBWURWUVFLCkRBbHBkRFkyTmk1amIyMHdnZ0VpTUEwR0NTcUdTSWIzRFFFQkFRVUFBNElCRHdBd2dnRUtBb0lCQVFDbkNYa0wKNjdlYzNjYW5IU1V2VDR6YXZmMGpsOEFPWlBtUERhdUFRTElEby80LzlhV2JPSy9yZm5OelVXV3lTRFBqb3pZVApWa2xmQTZYRG1xRU5FSWRHRlhjdExTSlRNRkM5Y2pMeTlwYVFaaDVYemZId0ZoZXZCR1J3MmlJNXdVdk5iTGdWCmNzcmRlNXlKMEZYOFlMZFRhdjhibzhjTXpxN2FqZXhXMWc1dkxmTWZhczAvd2VyVk9Qc0ZmS3RwZ1dwSWMxMXEKekx6RnlmWHNjcVNhVTV2NFo5WHFqQjRtQjhZZ043U2FSa2pzU0VsSFU4SXhENEdTOUtTNGtkR2xZak45V2hOcAp6aG5MdllpSDIrZThQWE9LdU8wK2Jla1MrS3lUS2hnNnFWK21kWTN0MWJGenpCdjFONTVobTNQTldjNk9ROTh3CkYrQk9uUUNhWExKVmRRcS9BZ01CQUFHalVEQk9NQjBHQTFVZERnUVdCQlNzSUFvMHZ4RFZjVWtIZ1V1TFlwY0wKdjBFSERqQWZCZ05WSFNNRUdEQVdnQlNzSUFvMHZ4RFZjVWtIZ1V1TFlwY0x2MEVIRGpBTUJnTlZIUk1FQlRBRApBUUgvTUEwR0NTcUdTSWIzRFFFQkN3VUFBNElCQVFDSjFEdGJoQnBacTE1ODVEMGlYV1RTdmU3Q2YvQ3VnakxZCjNYb2gwSU9sNy9mVmNndFJkWXlmRFBmRDFLN0l4bElETWtUbTVEVWEyQzBXaFY5UlZLU0poSTUzMmIyeVRGcm8Kc053eGhkcUZpOC9CU1lsQTl0Tk5HeXhKT1RKZWNtSUhsaFhjRlEvUzFaK3FjVWNrTVh6UHlIcFl0VjRaU0hheQpFWVF2bUVBZTFMNmlnRk8wc2xhbUllTFBCTWhlTDNnSDZQNlV3TVpQbTRqdFR1d2FGSmZGRlRIakQydmhSQkJKCmZjTGY5QjN3U3k2cjBDaXF2VXQxQUNQVnpSdFZrcWJJV1d5VTBDdkdjVDVIUUxPLzdhTE4vQkxpNGdYV2o1MUwKVXdTQzhoY2xodVp3SmRzckNkRlltcjhTMnk0UDhsaDdBc0ZNOGorNjh1ZHJlYXovWmFNbwotLS0tLUVORCBDRVJUSUZJQ0FURS0tLS0tCg==tls.key: LS0tLS1CRUdJTiBQUklWQVRFIEtFWS0tLS0tCk1JSUV2QUlCQURBTkJna3Foa2lHOXcwQkFRRUZBQVNDQktZd2dnU2lBZ0VBQW9JQkFRQ25DWGtMNjdlYzNjYW4KSFNVdlQ0emF2ZjBqbDhBT1pQbVBEYXVBUUxJRG8vNC85YVdiT0svcmZuTnpVV1d5U0RQam96WVRWa2xmQTZYRAptcUVORUlkR0ZYY3RMU0pUTUZDOWNqTHk5cGFRWmg1WHpmSHdGaGV2QkdSdzJpSTV3VXZOYkxnVmNzcmRlNXlKCjBGWDhZTGRUYXY4Ym84Y016cTdhamV4VzFnNXZMZk1mYXMwL3dlclZPUHNGZkt0cGdXcEljMTFxekx6RnlmWHMKY3FTYVU1djRaOVhxakI0bUI4WWdON1NhUmtqc1NFbEhVOEl4RDRHUzlLUzRrZEdsWWpOOVdoTnB6aG5MdllpSAoyK2U4UFhPS3VPMCtiZWtTK0t5VEtoZzZxVittZFkzdDFiRnp6QnYxTjU1aG0zUE5XYzZPUTk4d0YrQk9uUUNhClhMSlZkUXEvQWdNQkFBRUNnZ0VBTDZ0Tlp6Q0MrdnB6cWRkd2VEcjhtS1JsckpXdkVxeVFaOW5mMnI4Ynpsd3IKdi9jTHB1dWJrTnBLZWx0OWFVNmZ1RlFvcDRZVmRFOG5MRlpocGNmVXd4UjNLV1piQ0dDZWVpSXdGaFIzVFloSApHb25FaE43WkxYSlVjN3hjemh5eTFGSTFpckZ5NFpoWVNTQXltYzdFSXNORFFKRVJ5ajdsdWF1TkNnOFdtWFdPCmd0OHIzZHVTazNHV2ZZeGdWclFZSHlGTVpCbUpvNDliRzVzdGcwR01JNUZRQXord3RERlIyaWk2NkVkNzBJOUwKYXJNMHpQZkM3Tk1acmhEcHVseVdVYWNXRDY1V1g1Yys5TnpIMW15MEVrbjJGOWQzNXE1czZRakdTVElMVXlhbwpJUVl5bGU0OVdKdlV4YjN2YTZ1OTVBUHAyWFFVaFEyS09GcGxabncwTVFLQmdRRFN2cDAzYlBvQVlEb3BqWGlxCndxemxKdk9IY2M4V3ZhVytoM0tvVFBLZ1dRZWpvVnNZTFEzM2lMeXdFY0FXaWtoSzE2UjVmTkt5VUFRZ2JDNm4KNTdkcUJ3L1RqYlV2UGR6K0llMnNKN1BlSlpCQktXZUNHNjBOeGgzUDVJcSsxRHVjdExpQTBKdVZyOUlaUzdqSApJOVpUMitDMTNlNkRlZkJaajFDb0ZhemJ1UUtCZ1FESzZCaVkzSk5FYVhmWVpKUzh1NFViVW9KUjRhUURBcmlpCjFGRlEzMDFPOEF0b1A2US9IcjFjbTdBNGZkQ3JoSkxPMFNqWnpldnF4NEVHSnBueG5pZGowL24yTHE3Z2x6Q2UKbVlKZFVVVFo0MkxJNGpWelBlUk1RaGhueW9CTHpmaEFYcEtZSU1NcmpTd1JUcnYyclRpQkhxSEZRbDN6YngvKwptcjdEVWtlR053S0JnRllPdEpDUGxiOVZqQ3F2dEppMmluZkE0aTFyRWcvTlBjT0IrQlkxNWRZSXhRL1NzaW83Cks3cnJRWEg4clo0R3RlS3FFR1h6ek80M3NwZXkxWktIRXVUZklWMVlQcWFkOG9Kc1JHdktncTZ5VkNmbnluYmMKNmx2M2pQRDUrSlpZZ0VkTG5SUXRHM3VTb283bDF2eXE2N2l1enlJMUVGTHNGblBjRENtM1FERXhBb0dBSDQrdQprOGhybDg2WDk2N2RlK1huTkhMSEZwbDBlNHRtME4wWnNPeXJCOFpLMy9KV1NBTXVEVU9pUzRjMmVCZHRCb0orClNqSy9xWXRTeEhRb3FlNmh6ZU5oRkN2Nnc3Q0F2WXEvUG1pdnZ2eWhsd0dvc3I1RHpxRFJUd091cFJ2cXE0aUsKWU9ObnVGU0RNRVlBOHNQSzhEcWxpeHRocGNYNVFnOHI4UkhSVWswQ2dZQlF3WFdQU3FGRElrUWQvdFg3dk1mTwp3WDdWTVFMK1NUVFA4UXNRSFo2djdpRlFOL3g3Vk1XT3BMOEp6TDdIaGdJV3JzdkxlV1pubDh5N1J3WnZIbm9zCkY3dkliUm00L1Y1YzZHeFFQZXk5RXVmWUw4ejRGMWhSeUc2ZjJnWU1jV25NSWpnaUh2dTA3cStuajFORkh4YVkKa2ZSSERia01YaUcybU42REtyL3RtQT09Ci0tLS0tRU5EIFBSSVZBVEUgS0VZLS0tLS0K
kind: Secret
metadata:
creationTimestamp: "2022-06-10T12:06:22Z"
name: it666-tls
namespace: default
resourceVersion: "2264722"
uid: 16f8a4b6-1600-4ded-8458-b0480ce075ba
type: kubernetes.io/tls
配置域名使用證書
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: itlanson-ingress
namespace: default
spec:
tls:
- hosts:
- itlanson.com
secretName: itlanson-tls
rules:
- host: itlanson.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: my-nginx-svc
port:
number: 80
配置好證書,訪問域名,就會默認跳轉到 https
?
Annotations - NGINX Ingress Controller
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: ingress-222333
namespace: default
annotations: ##注解
nginx.ingress.kubernetes.io/limit-rps: "1" ### 限流的配置
spec:
defaultBackend: ## 只要未指定的映射路徑
service:
name: php-apache
port:
number: 80
rules:
- host: it666.com
http:
paths:
- path: /bbbbb
pathType: Prefix
backend:
service:
name: cluster-service-222
port:
number: 80
以前可以使用 k8s 的 Service 配合 Deployment 進行金絲雀部署。原理如下
缺點:
現在可以使用 Ingress 進行灰度。原理如下
## 使用如下文件部署兩個service版本。v1版本返回nginx默認頁,v2版本返回 11111
apiVersion: v1
kind: Service
metadata:
name: v1-service
namespace: default
spec:
selector:
app: v1-pod
type: ClusterIP
ports:
- name: http
port: 80
targetPort: 80
protocol: TCP
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: v1-deploy
namespace: default
labels:
app: v1-deploy
spec:
selector:
matchLabels:
app: v1-pod
replicas: 1
template:
metadata:
labels:
app: v1-pod
spec:
containers:
- name: nginx
image: nginx
---
apiVersion: v1
kind: Service
metadata:
name: canary-v2-service
namespace: default
spec:
selector:
app: canary-v2-pod
type: ClusterIP
ports:
- name: http
port: 80
targetPort: 80
protocol: TCP
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: canary-v2-deploy
namespace: default
labels:
app: canary-v2-deploy
spec:
selector:
matchLabels:
app: canary-v2-pod
replicas: 1
template:
metadata:
labels:
app: canary-v2-pod
spec:
containers:
- name: nginx
image: registry.cn-hangzhou.aliyuncs.com/lanson_k8s_images/nginx-test:env-msg
https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/annotations/#session-affinityhttps://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/annotations/#session-affinity
第一次訪問,ingress-nginx 會返回給瀏覽器一個 Cookie,以后瀏覽器帶著這個 Cookie,保證訪問總是抵達之前的 Pod;
?
## 部署一個三個Pod的Deployment并設置Service
apiVersion: v1
kind: Service
metadata:
name: session-affinity
namespace: default
spec:
selector:
app: session-affinity
type: ClusterIP
ports:
- name: session-affinity
port: 80
targetPort: 80
protocol: TCP
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: session-affinity
namespace: default
labels:
app: session-affinity
spec:
selector:
matchLabels:
app: session-affinity
replicas: 3
template:
metadata:
labels:
app: session-affinity
spec:
containers:
- name: session-affinity
image: nginx
編寫具有會話親和的 ingress
導語:pipdeptree是TencentOS發行版軟件包中維護python軟件包使用的工具,提供了統計和管理python包依賴鏈的能力。作為發行版軟件包的負責人,對該領域深挖且精通,是保證其穩定性和高可用性的必要條件。從從未接觸過這個工具的初學者,經歷了差不多一個月時間的努力,完成了重構工作,并收到了高星項目 `pipdeptree` 的邀請成為項目maintainer,需要的是大量的業余時間以及耐心,翻閱各種手冊以及源碼實現。
伴隨最近AI生態越來越大,而python最為AI的底座語言,相關的包、代碼庫當然是越來越多。所以解決python的依賴地獄問題是一個可以帶來收益的優化點。這個收益主要來自兩方面:
pipdeptree 工具的能力符合我們的要求,并且在TencentOS Server 4 已經集成了這個包,可以通過dnf install python3-pipdeptree 安裝。
同時我們也在制作python runtime、pytorch等鏡像時,會通過 pipdeptree 進行優化,提供體積更小更輕便的鏡像(如果大家有需求,可以與我們聯系)。
pipdeptree工具使用很簡單 主要使用的是下面兩條
pipdeptree -p ABC
以及
pipdeptree -j
至于他的原理,其實是使用了官方的pkg_resources庫,其中核心的為下面兩部分
from pip._internal.metadata import pkg_resources
dists=pkg_resources.Environment.from_paths(None).iter_installed_distributions(
local_only=local_only,
skip=(),
user_only=user_only,
)
2.DistInfoDistribution.requires()來獲取相關包的依賴項
from pip._vendor.pkg_resources import DistInfoDistribution
def requires(self) -> list[Requirement]:
return self._obj.requires() # type: ignore[no-untyped-call,no-any-return]
最終生成的是環境中所有安裝包的依賴樹,例如
{
"package": {
"key": "adal",
"package_name": "adal",
"installed_version": "1.2.7"
},
"dependencies": [
{
"key": "cryptography",
"package_name": "cryptography",
"installed_version": "41.0.4",
"required_version": ">=1.1.0"
},
{
"key": "pyjwt",
"package_name": "PyJWT",
"installed_version": "2.6.0",
"required_version": ">=1.0.0,<3"
},
{
"key": "python-dateutil",
"package_name": "python-dateutil",
"installed_version": "2.8.2",
"required_version": ">=2.1.0,<3"
},
{
"key": "requests",
"package_name": "requests",
"installed_version": "2.28.2",
"required_version": ">=2.0.0,<3"
}
]
},
同時,我們發行版自己打包python包的時候也會使用rpm的依賴生成方式,也就是BuildRequires、Requires。相比于python包本身的依賴列表,發行版打包過程中很容易引入多余的依賴,導致該軟件包的依賴鏈產生了冗余。結合背景問題中提到的問題“python開發者有時候并不能完全準確的列出當前包所需的依賴”,有可能是代碼變化之后原來依賴的不依賴了等情況。
綜上,目前想到的優化點有兩個:
嘗試結果
第一步Demo
import json
import subprocess
import re
FILTERED_DEPENDENCIES=['python3']
def extract_package_name(dep):
match=re.match(r'.*python3(?:\.\d+)?dist\(([^)]+)\).*', dep)
if match:
return match.group(1)
return dep.split(' ')[0]
def get_package_dependencies(package_name):
package_name=get_rpmname(package_name)
print(package_name)
try:
command=f'rpm-dep -i {package_name} -q'
subprocess.run(command.split())
parse_cmd=f"jq -r '.next[] | .pkg_name' dep_tree__{package_name}__install.json | sort | uniq"
output=subprocess.getoutput(parse_cmd)
dependencies=output.strip().split('\n')
return dependencies
except subprocess.CalledProcessError:
return []
def get_rpmname(py_name):
if not py_name.startswith("python-"):
# try python3dist(ABC)
command=f"dnf repoquery --whatprovides 'python3dist({py_name})' --latest-limit 1 --queryformat '%{{NAME}}' -q"
output=subprocess.getoutput(command)
# try python-ABC
if output=="":
# try lower case
command=f"dnf repoquery --whatprovides 'python3dist({py_name.lower()})' --latest-limit 1 --queryformat '%{{NAME}}' -q"
output=subprocess.getoutput(command)
# last chance
if output=="":
py_name=f"python3-{py_name}"
info_command=f'dnf info python3-{py_name}'
info_result=subprocess.run(info_command.split(), stderr=subprocess.DEVNULL, stdout=subprocess.PIPE, text=True)
if info_result.returncode !=0:
py_name="ERROR"
else:
py_name=output
else:
py_name=py_name[7:]
info_command=f'dnf info python3-{py_name}'
info_result=subprocess.run(info_command.split(), stderr=subprocess.DEVNULL, stdout=subprocess.PIPE, text=True)
if info_result.returncode !=0:
py_name="ERROR"
else:
py_name=f"python3-{py_name}"
return py_name
def check_dependencies(package_data):
package_name=package_data['package']['key']
local_dependencies=[get_rpmname(dep['key']) for dep in package_data['dependencies']]
repo_dependencies=get_package_dependencies(package_name)
missing_dependencies=list(set(repo_dependencies) - set(local_dependencies))
extra_dependencies=list(set(local_dependencies) - set(repo_dependencies))
#print(local_dependencies)
#print(repo_dependencies)
# 過濾 FILTERED_DEPENDENCIES 列表中的依賴項
missing_dependencies=[dep for dep in missing_dependencies if dep not in FILTERED_DEPENDENCIES]
extra_dependencies=[dep for dep in extra_dependencies if dep not in FILTERED_DEPENDENCIES]
print(missing_dependencies)
print(extra_dependencies)
return {
'package_name': get_rpmname(package_name),
'missing_dependencies': missing_dependencies,
'extra_dependencies': extra_dependencies
}
def main():
with open('packages.json', 'r') as file:
packages_data=json.load(file)
result=[]
for package_data in packages_data:
package_result=check_dependencies(package_data)
result.append(package_result)
with open('result.json', 'w') as file:
json.dump(result, file, indent=2)
if __name__=='__main__':
main()
根據最終結果分析,存在以下問題會導致結果不準確:
第二步Demo
import ast
import importlib.metadata
import importlib.resources
import json
import os
import sys
import re
# 獲取內置模塊列表
builtin_modules=set(sys.builtin_module_names)
def get_standard_library_modules():
lib_path=os.path.dirname(os.__file__)
modules=[]
def add_module(root, file):
module_path=os.path.relpath(os.path.join(root, file), lib_path)
module_name=os.path.splitext(module_path.replace(os.path.sep, '.'))[0]
if module_name.endswith('.__init__'):
module_name=module_name[:-9]
modules.append(module_name)
for root, dirs, files in os.walk(lib_path):
if 'site-packages' in dirs:
dirs.remove('site-packages')
if root==lib_path:
# 獲取第一層的所有 .py 文件名
for file in files:
if file.endswith('.py'):
add_module(root, file)
# 處理帶有 __init__.py 文件的目錄鏈
if '__init__.py' in files:
add_module(root, '__init__.py')
return modules
# 添加一些常見的標準庫模塊
builtin_modules.update(get_standard_library_modules())
def parse_imports(file_path):
with open(file_path, 'r') as file:
content=file.read()
# 移除所有單行注釋
content=re.sub(r'#.*', '', content)
# 移除所有多行注釋
content=re.sub(r'""".*?"""', '', content, flags=re.DOTALL)
# 匹配 import 和 from import 語句
import_re=re.compile(r'(?:from\s+([.\w]+)(?:\s+import\s+[\w, ()]+)|import\s+([\w, ()]+))')
matches=import_re.findall(content)
imports=[]
for match in matches:
# match 是一個元組,其中一個元素是空字符串,另一個元素是模塊名
module_names=match[0] if match[0] else match[1]
# 如果模塊名以'.'開頭,說明是相對導入,我們忽略它
if not module_names.startswith('.'):
module_names=module_names.split(',')
for module_name in module_names:
# 處理別名導入的情況
module_name=module_name.strip().split(' as ')[0].split('.')[0]
if module_name not in builtin_modules and not module_name.startswith('_'):
imports.append(module_name)
return imports
def get_package_imports():
package_imports={}
dists=importlib.metadata.distributions()
for dist in dists:
package_name=dist.metadata['Name']
try:
package_dir=importlib.resources.files(package_name)
if package_dir is not None:
package_imports[package_name]={}
for root, dirs, files in os.walk(str(package_dir)):
for file in files:
if file.endswith('.py'):
file_path=os.path.join(root, file)
imports=parse_imports(file_path)
# 去重并篩除當前包名
imports=list(set(imports))
if package_name in imports:
imports.remove(package_name)
package_imports[package_name][file_path]=imports
except:
pass
return package_imports
# 獲取所有包的import信息
package_imports=get_package_imports()
# 轉換為JSON格式并打印
json_data=json.dumps(package_imports, indent=4)
print(json_data)
# 讀取package.json文件
with open('packages.json', 'r') as file:
package_data=json.load(file)
# 檢查每個包的imports是否都在dependencies中
for package in package_data:
package_name=package['package']['package_name']
if package_name in package_imports:
dependencies={dep['package_name'] for dep in package['dependencies']}
for file_path, imports in package_imports[package_name].items():
for import_name in imports:
if import_name not in dependencies:
print(f'In package {package_name}, file {file_path} imports {import_name} which is not in dependencies.')
else:
print(f'In package {package_name}, file {file_path} imports {import_name} is found in pipdeptree.')
根據最終結果分析,存在以下問題會導致結果不準確:
tooz==4.2.0
├── fasteners [required: >=0.7, installed: 0.19]
├── futurist [required: >=1.2.0, installed: 2.4.1]
├── msgpack [required: >=0.4.0, installed: 1.0.5]
├── oslo.serialization [required: >=1.10.0, installed: 5.0.0]
├── oslo.utils [required: >=4.7.0, installed: 6.0.1]
├── pbr [required: >=1.6, installed: 5.11.1]
├── stevedore [required: >=1.16.0, installed: 4.0.2]
├── tenacity [required: >=5.0.0, installed: 8.2.3]
└── voluptuous [required: >=0.8.9, installed: 0.13.1]
In package tooz, file /usr/lib/python3.11/site-packages/tooz/drivers/etcd3.py imports oslo_utils which is not in dependencies.
這也會導致python包路徑獲取不完整,因為通過dists=importlib.metadata.distributions()獲取的包名(例如 pycryptodome )和實際的模塊名(例如 Crypto )不一樣,在package_dir=importlib.resources.files(package_name)這一步是靠先import來找文件的,所以會直接報錯。
該問題可以使用importlib_metadata.packages_distributions來解決,這個API返回的是每個分發包的包名和可import模塊的映射
當前發現的問題
依賴缺失
這種情況不一定是問題,因為部分模塊只是被弱依賴,也就是沒有他們也能正常運行。
可選模塊
In package urllib3, file /usr/lib/python3.11/site-packages/urllib3/response.py imports brotli which is not in dependencies.
try:
try:
import brotlicffi as brotli
except ImportError:
import brotli
except ImportError:
brotli=None
還有比如測試代碼,也會有一些依賴缺失,這些優先級不高
In package zake, file /usr/lib/python3.11/site-packages/zake/test.py imports testtools which is not in dependencies.
上游開發問題
還有就是真的是上游沒有寫好依賴,比如urllib3這個包。
In package urllib3, file /usr/lib/python3.11/site-packages/urllib3/contrib/appengine.py imports google which is not in dependencies.
In package urllib3, file /usr/lib/python3.11/site-packages/urllib3/contrib/socks.py imports socks which is not in dependencies.
In package urllib3, file /usr/lib/python3.11/site-packages/urllib3/contrib/pyopenssl.py imports OpenSSL which is not in dependencies.
In package urllib3, file /usr/lib/python3.11/site-packages/urllib3/contrib/pyopenssl.py imports idna which is not in dependencies.
In package urllib3, file /usr/lib/python3.11/site-packages/urllib3/contrib/pyopenssl.py imports cryptography which is not in dependencies.
In package urllib3, file /usr/lib/python3.11/site-packages/urllib3/contrib/ntlmpool.py imports ntlm which is not in dependencies.
再比如tox缺失了部份依賴
tox==4.10.0
├── cachetools [required: Any, installed: 5.3.1]
├── chardet [required: >=5.2, installed: 5.2.0]
├── colorama [required: >=0.4.6, installed: 0.4.6]
├── filelock [required: Any, installed: 3.12.4]
├── packaging [required: Any, installed: 23.1]
├── platformdirs [required: Any, installed: 2.5.4]
├── pluggy [required: Any, installed: 1.3.0]
├── pyproject-api [required: Any, installed: 1.5.1]
│ └── packaging [required: >=23, installed: 23.1]
└── virtualenv [required: >=20, installed: 20.21.1]
├── distlib [required: >=0.3.6,<1, installed: 0.3.7]
├── filelock [required: >=3.4.1,<4, installed: 3.12.4]
└── platformdirs [required: >=2.4,<4, installed: 2.5.4]
In package tox, file /usr/lib/python3.11/site-packages/tox/tox_env/python/virtual_env/package/pyproject.py imports tomli which is not in dependencies.
n package tox, file /usr/lib/python3.11/site-packages/tox/execute/local_sub_process/read_via_thread_unix.py imports select which is not in dependencies.
還有tooz等。
In package tooz, file /usr/lib/python3.11/site-packages/tooz/drivers/pgsql.py imports psycopg2 which is not in dependencies.
In package tooz, file /usr/lib/python3.11/site-packages/tooz/drivers/mysql.py imports pymysql which is not in dependencies.
依賴冗余
通過工具結果也發現了部分python包存在依賴冗余的情況,比如cheroot
{
"package_name": "python3-cheroot",
"missing_dependencies": [
"python3-six",
"python3-pyOpenSSL"
],
"extra_dependencies": []
},
結果顯示存在2個冗余依賴,其中python3-six這個包是python3兼容python2兼容包,在當前版本cheroot已經完全適配。所以該依賴在上游已刪除 https://github.com/cherrypy/cheroot/commit/f3170d40a699219345abb5813395ff39319fec86
而pyOpenSSL在cheroot-10.0.0/stubtest_allowlist.txt中,屬于測試依賴,可以僅作為編譯依賴,而在運行依賴裁剪。參考其他發行版如suse已采取過相同裁剪 https://build.opensuse.org/projects/openSUSE:Factory/packages/python-cheroot/files/python-cheroot.changes 參考: https://gitee.com/opencloudos-stream/python-cheroot/pulls/3
循環依賴
Warning!! Cyclic dependencies found:
* sphinxcontrib-serializinghtml=> Sphinx=> sphinxcontrib-serializinghtml
* sphinxcontrib-htmlhelp=> Sphinx=> sphinxcontrib-htmlhelp
* sphinxcontrib-qthelp=> Sphinx=> sphinxcontrib-qthelp
* sphinxcontrib-applehelp=> Sphinx=> sphinxcontrib-applehelp
* sphinxcontrib-devhelp=> Sphinx=> sphinxcontrib-devhelp
* Sphinx=> sphinxcontrib-applehelp=> Sphinx
軟件包存在循環依賴會使其編譯構建受到依賴版本的影響,會讓原本單一的依賴鏈變得復雜。
進階:pipdeptree 基于新API重構
upstream pr:
https://github.com/tox-dev/pipdeptree/pull/333
背景
pipdeptree這個項目在tox項目下,是除了tox本身最高星的項目,由現就職于bloomberg公司的gaborbernat開發,這個作者也是virtualenv, tox, platformdirs, filelock等一系列python中比較重要的社區的創始人,以及python-build等其他核心社區的maintainer,是python圈內比較知名的大佬。
要將已經deprecated的APi pkg_resources廢除,替換為importlib.metadata以及packaging等,需要重構pipdeptree的核心邏輯,涉及到整個pipdeptree的代碼樹,所以比較復雜、坑也比較多。
from pip._vendor.pkg_resources import DistInfoDistribution
DistInfoDistribution可以通過importlib.metadata.Distribution替代。
但是需要注意的是importlib.metadata.Distribution不再有key以及project_name屬性,所以涉及的地方需要替換為metdadata.Distribution.metadata["Name"]
例如:
def __init__(self, obj: Distribution, req: ReqPackage | None=None) -> None:
super().__init__(obj.metadata["Name"])
from pip._vendor.pkg_resources import Requirement
它可以通過from packaging.requirements import Requirement替代。
與上面一樣,不再有key以及project_name屬性,所以需要替代為.name,如
def __init__(self, obj: Requirement, dist: DistPackage | None=None) -> None:
super().__init__(obj.name)
基礎類型的替換,并不能完全解決問題,更多的問題是需要解決該基礎類型支持的屬性以及API。 iter_installed_distributions這個API,他的作用是根據參入參數返回一個DistInfoDistribution列表
iter_installed_distributions(local_only: bool=True, skip: Container[str]={'python', 'wsgiref', 'argparse'}, include_editables: bool=True, editables_only: bool=False, user_only: bool=False) -> Iterator[pip._internal.metadata.base.BaseDistribution] method of pip._internal.metadata.pkg_resources.Environment instance
Return a list of installed distributions.
涉及代碼如下,我們需要解決的就是這三個參數,也就是local_only和user_only怎么通過新API來區分。
from pip._internal.metadata import pkg_resources
dists=pkg_resources.Environment.from_paths(None).iter_installed_distributions(
local_only=local_only,
skip=(),
user_only=user_only,
)
local_only作用是區分虛擬環境和全局環境
(myenv) [root@linux ~]# python3 -c "import sys;print(sys.path)"
['', '/usr/lib64/python311.zip', '/usr/lib64/python3.11', '/usr/lib64/python3.11/lib-dynload', '/root/myenv/lib64/python3.11/site-packages', '/root/myenv/lib/python3.11/site-packages']
(myenv) [root@linux ~]# python3 -c "import sys;print(sys.prefix)"
/root/myenv
(myenv) [root@linux ~]# python3 -c "import sys;print(sys.base_prefix)"
/usr
(myenv) [root@linux ~]#
pip中的判斷邏輯如下
def _running_under_venv() -> bool:
"""Checks if sys.base_prefix and sys.prefix match.
This handles PEP 405 compliant virtual environments.
"""
return sys.prefix !=getattr(sys, "base_prefix", sys.prefix)
def _running_under_legacy_virtualenv() -> bool:
"""Checks if sys.real_prefix is set.
This handles virtual environments created with pypa's virtualenv.
"""
# pypa/virtualenv case
return hasattr(sys, "real_prefix")
def running_under_virtualenv() -> bool:
"""True if we're running inside a virtual environment, False otherwise."""
return _running_under_venv() or _running_under_legacy_virtualenv()
所以我們直接簡化這部分邏輯,判斷sys.prefix與sys.base_prefix是否相同,如果不相同則說明當前處在虛擬環境中,如果相同則說明處在系統環境。然后通過site.getsitepackages()獲取指定前綴下的python路徑(site-packages)。
in_venv=sys.prefix !=sys.base_prefix
if local_only and in_venv:
venv_site_packages=site.getsitepackages([sys.prefix])
return list(distributions(path=venv_site_packages))
最后通過importlib.metadata.distributions函數找到該python路徑下的所有python分發包,并返回一個Distribution列表 這個API是對Distribution.discover的封裝
def distributions(**kwargs):
"""Get all ``Distribution`` instances in the current environment.
:return: An iterable of ``Distribution`` instances.
"""
return Distribution.discover(**kwargs)
后者會將接收到的內容用Context封裝后,最終通過sys.meta_path里面的元數據查找器來查找系統中的所有分發包。
[root@linux ~]# python3 -c "import sys; print(sys.meta_path)"
[<_distutils_hack.DistutilsMetaFinder object at 0x7f124cd15bd0>, <class '_frozen_importlib.BuiltinImporter'>, <class '_frozen_importlib.FrozenImporter'>, <class '_frozen_importlib_external.PathFinder'>]
user_only的作用,則是區分用戶環境和全局環境 所以這里我們直接通過site.getusersitepackages()獲取用戶的site_packages目錄然后一樣distributions獲取分發包列表。
if user_only:
return list(distributions(path=[site.getusersitepackages()]))
DistInfoDistribution.requires() 會直接返回 pip._vendor.pkg_resources.Requirement 類型。而Distribution.requires :https://github.com/python/cpython/blob/3.12/Lib/importlib/metadata/__init__.py#L558則會返回純字符串。
@property
def requires(self):
"""Generated requirements specified for this Distribution"""
reqs=self._read_dist_info_reqs() or self._read_egg_info_reqs()
return reqs and list(reqs)
所以我們需要對獲取的字符串進行處理,再將其轉換為packaging.requirements.Requirement對象。
def requires(self) -> list[Requirement]:
req_list=[]
req_name_list=[]
if self._obj.requires:
for r in self._obj.requires:
req=Requirement(r)
is_extra_req=req.marker and contains_extra(str(req.marker))
if not is_extra_req and req.name not in req_name_list:
req_list.append(req)
req_name_list.append(req.name)
return req_list
并且需要注意的是,Distribution.requires 現在返回的依賴中會包含marker,例如下面示例
"pytest ; extra=='tests'"
根據討論1:https://github.com/tox-dev/pipdeptree/pull/333#discussion_r1527662006
以及
討論2:https://github.com/tox-dev/pipdeptree/pull/333#discussion_r1527881146
我們目前只需要保證主要的依賴就可以,后續如果需要支持marker再繼續拓展。
備注: 這里的marker主要的含義就是,安裝包的時候可能需要的某些附加功能,例如:如果需要安裝CT3的markdown相關功能 需要在安裝的時候指定
pip install CT3[markdown]
因為CT3的METADATA中指定了
Provides-Extra: filters
Requires-Dist: markdown ; extra=='filters'
Provides-Extra: markdown
Requires-Dist: markdown ; extra=='markdown'
Marker包含了extra以及其他的例如python版本限定python_version < 3.11等。不同的extra會有單獨的依賴,這就意味著比如CT3這個模塊,他是可以不支持markdown能力的,所以主模塊并不是一定需要該依賴,這里也是為什么社區建議先不考慮extra的原因。如果這里需要將extra也列出來,有以下幾個辦法:
as_frozen_repr 函數中使用的 metadata.pkg_resources.Distribution這個API傳入FrozenRequirement.from_dist時會使用到以下幾個屬性,但是DistPackage以及importlib.metadata.Distribution都沒有,所以這里我們需要自己實現。
@property
def editable(self) -> bool:
return bool(self.editable_project_location)
@property
def direct_url(self) -> DirectUrl | None:
direct_url_metadata_name="direct_url.json"
result=None
try:
j_content=self._obj.read_text(direct_url_metadata_name)
except FileNotFoundError: # pragma: no cover
return result
try:
if j_content:
result=DirectUrl.from_json(j_content)
except (
UnicodeDecodeError,
json.JSONDecodeError,
DirectUrlValidationError,
):
return result
return result
@property
def raw_name(self) -> str:
return self.project_name
@property
def editable_project_location(self) -> str | None:
direct_url=self.direct_url
if direct_url and direct_url.is_local_editable():
from pip._internal.utils.urls import url_to_path # noqa: PLC2701, PLC0415
return url_to_path(direct_url.url)
result=None
egg_link_path=egg_link_path_from_sys_path(self.raw_name)
if egg_link_path:
with Path(egg_link_path).open("r") as f:
result=f.readline().rstrip()
return result
這部分的兼容參考這里的討論:https://github.com/tox-dev/pipdeptree/pull/333#discussion_r1533235445。我們需要做的是獲取python包對應的direct_url.json文件,然后將其中的數據解析出來賦值給DistPackage的成員,具體實現參考了pip中的direct_url實現,見 源碼:https://github.com/pypa/pip/blob/f5e4ee104e7b171a7cfb2843c9c602abf7a4e346/src/pip/_internal/metadata/base.py#L289。
同時還需要自己實現editable_project_location接口,這個接口的實現參考 鏈接:https://github.com/pypa/pip/blob/f5e4ee104e7b171a7cfb2843c9c602abf7a4e346/src/pip/_internal/utils/egg_link.py#L33
Python分發包的可編輯模式和DirectUrl模塊
這里簡單介紹一下Python中的DirectUrl模塊,這個模塊主要是對direct_url.json文件的解析和使用。首先,并不是所有python包都會有direct_url.json文件,常見的是通過URLs也就是鏈接安裝的分發包才會有,這個鏈接可以是本地鏈接,也可以是遠端鏈接。見 Direct URL 介紹:https://packaging.python.org/en/latest/specifications/direct-url-data-structure/
# pip install -e munkres-1.1.4/
然后你就會在python目錄下看到他的.dist-info目錄中存在direct_url.json
# ls /usr/local/lib/python3.11/site-packages/munkres-1.1.4.dist-info/
INSTALLER LICENSE.md METADATA RECORD REQUESTED WHEEL direct_url.json top_level.txt
這個文件中記錄了這個分發包的真正的路徑url以及他是否屬于可編輯模式editable。
{"dir_info": {"editable": true}, "url": "file:///data/gitee/python-munkres/munkres-1.1.4"}
Python中的editable,指的是直接將項目源碼鏈接到python目錄(通常為site-package),常用于項目開發階段,不用每次修改代碼后再走安裝流程,修改的代碼會直接生效。除了上面說的通過pip install URLs之外,還有另一種.egg-link機制。當你在python源碼路徑中執行python3 setup.py develop的時候,會出現下面一段日志。
running egg_info
writing munkres.egg-info/PKG-INFO
writing dependency_links to munkres.egg-info/dependency_links.txt
writing top-level names to munkres.egg-info/top_level.txt
reading manifest file 'munkres.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
adding license file 'LICENSE.md'
writing manifest file 'munkres.egg-info/SOURCES.txt'
running build_ext
Creating /usr/local/lib/python3.11/site-packages/munkres.egg-link (link to .)
munkres 1.1.4 is already the active version in easy-install.pth
Installed /data/gitee/python-munkres/munkres-1.1.4
Processing dependencies for munkres==1.1.4
Finished processing dependencies for munkres==1.1.4
他會在/usr/local/lib/python3.11/site-packages/下創建一個munkres.egg-link,這個.egg-link文件中寫的就是項目源碼的路徑
# cat /usr/local/lib/python3.11/site-packages/munkres.egg-link
/data/gitee/python-munkres/munkres-1.1.4
.
所以源碼中self.location的值就可以通過讀取該.egg-link文件獲得。這樣,這段邏輯就可以重寫為下面這樣。
@property
def editable_project_location(self) -> str | None:
if self.direct_url:
from pip._internal.utils.urls import url_to_path # noqa: PLC2701, PLC0415
return url_to_path(self.direct_url)
egg_link_path=egg_link_path_from_sys_path(self.raw_name)
if egg_link_path:
with Path(egg_link_path).open("r") as f:
location=f.readline().rstrip()
return location
return None
pkg_resources.Requirement有一個參數specs,返回的是a>=1.2中的>=1.2也就是版本控制字段。見代碼:https://github.com/pypa/pip/blob/f5e4ee104e7b171a7cfb2843c9c602abf7a4e346/src/pip/_vendor/pkg_resources/__init__.py#L3153 替換為packaging.requirements.Requirement的specifier字段,參考手冊:https://packaging.pypa.io/en/stable/specifiers.html。但是需要注意的是,這里返回的是SpecifierSet對象,需要轉成str類型處理。
@property
def version_spec(self) -> str | None:
result=None
specs=sorted(map(str, self._obj.specifier), reverse=True)
if specs:
result=",".join(specs)
return result
但是進行版本對比的時候最好還是使用對象,使用自帶的對比方法,如
if ver_spec:
req_obj=SpecifierSet(ver_spec)
else:
return False
return self.installed_version not in req_obj
Mock模塊,是python單測里經常用到的模塊,它可以模仿一個假的函數或者對象。
模仿函數 例如下面這樣,規定了foo.read_text這個函數的返回值為json_text,也就是只要執行了foo.read_text,他的返回值一定是json_text
foo.read_text=Mock(return_value=json_text)
模仿對象 Mock還可以模仿一個對象,制定了模仿對象的屬性之后,其他代碼中如果調用foo.metadata["Name"],返回的就是foo。
foo=Mock(metadata={"Name": "foo"}, version="20.4.1")
所以,我們通過這種方式,模擬出了一個完整的流程,從對象的屬性,到函數方法,這樣,測試的目的函數就可以返回我們期望的內容。例如:
def test_dist_package_render_as_root_with_frozen() -> None:
json_text='{"dir_info": {"editable": true}, "url": "file:///A/B/foo"}'
foo=Mock(metadata={"Name": "foo"}, version="20.4.1")
foo.read_text=Mock(return_value=json_text)
dp=DistPackage(foo)
is_frozen=True
expect="# Editable install with no version control (foo===20.4.1)\n-e /A/B/foo"
assert dp.render_as_root(frozen=is_frozen)==expect
因為Mock類本身限制問題,Mock(name=XXX)的聲明無效,即調用其name屬性時會得到一個Mock.name對象。所以這里需要MagicMock的介入,比如
- result=ReqPackage(mocker.MagicMock(key="setuptools")).installed_version
+ r=MagicMock()
+ r.name="setuptools"
+ result=ReqPackage(r).installed_version
如這里:https://github.com/tox-dev/pipdeptree/pull/333/#issuecomment-2018311809所說,通過virtualenv.cli_run去運行一個虛擬環境,并且在其中執行命令時,需要通過pytest.CaptureFixture去捕捉結果,但是代碼中只捕獲了正常輸出,丟棄了異常輸出,導致實際報錯的命令并沒有顯示錯誤。
out, _=capfd.readouterr()
最后定位為新增了新的API依賴packaging,導致虛擬環境缺少依賴命令執行錯誤。需要在測試環境中安裝packaging以解決。
- expected={"pip", "setuptools", "wheel"}
+ expected={"packaging", "pip", "setuptools", "wheel"}
進一步優化:因為python的測試環境最好不與外網聯通,所以在測試環境里pip install packaging的方式非常的且不穩定。
所以,我們決定對工具的--python選項邏輯進行重構。通過將外部環境(運行pipdeptree的環境)中的packaging “偷"進測試空間。
packaging_src=getsourcefile(sys.modules["packaging"])
assert packaging_src is not None
packaging_root=Path(packaging_src).parent
copytree(packaging_root, dest / "packaging")
cmd=[str(py_path), "-m", "pipdeptree", *argv]
env=os.environ.copy()
return call(cmd, cwd=project, env=env)
充分利用python -m會將當前目錄加入sys.path這一特性。如下:
By default, as initialized upon program startup, a potentially unsafe path is prepended to [`sys.path`](https://docs.python.org/3/library/sys.html#sys.path) (before the entries inserted as a result of [`PYTHONPATH`](https://docs.python.org/3/using/cmdline.html#envvar-PYTHONPATH)):
- `python -m module` command line: prepend the current working directory.
- `python script.py` command line: prepend the script’s directory. If it’s a symbolic link, resolve symbolic links.
- `python -c code` and `python` (REPL) command lines: prepend an empty string, which means the current working directory.
因為pipdeptree中--local-only選項的存在,我們需要模擬工具在虛擬環境中的執行。Python的虛擬環境,其實就是將當前的sys.prefix設置為自定義目錄,然后使用虛擬環境中的一套python套件。所以,我們通過以下方式模擬了pipdeptree在虛擬環境中執行的效果。通過monkeypatch來設置一些環境變量,例如sys.prefix以及傳入的參數argv。
def test_local_only(
tmp_path: Path,
monkeypatch: pytest.MonkeyPatch,
capfd: pytest.CaptureFixture[str],
) -> None:
prefix=str(tmp_path / "venv")
result=virtualenv.cli_run([str(tmp_path / "venv"), "--activators", ""])
pip_path=str(result.creator.exe.parent / "pip")
subprocess.run(
[pip_path, "install", "wrapt", "--prefix", prefix],
stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL,
check=False,
)
cmd=[str(result.creator.exe.parent / "python3")]
monkeypatch.chdir(tmp_path)
cmd +=["--local-only"]
monkeypatch.setattr(sys, "prefix", [str(tmp_path / "venv")])
monkeypatch.setattr(sys, "argv", cmd)
main()
out, _=capfd.readouterr()
found={i.split("==")[0] for i in out.splitlines()}
expected={"wrapt", "pip", "setuptools", "wheel"}
if sys.version_info >=(3, 12):
expected -={"setuptools", "wheel"} # pragma: no cover
assert found==expected
完成上述核心代碼的重構后,得到了包括gaborbernat在內的兩位核心maintainer的認可。最終接收了邀請,成為該項目的一員。
作者:cunshun
來源-微信公眾號:鵝廠架構師
出處:https://mp.weixin.qq.com/s/dSWwbxzDfuxLWu0Achg8jg