操作系統(tǒng)為了適應更多的硬件環(huán)境,許多初始的設置值,寬容度都很高。如果不經(jīng)調(diào)整,這些值可能無法適應HPC,或者硬件稍好些的環(huán)境,無法發(fā)揮更好的硬件性能,甚至可能影響某些應用軟件的使用,特別是數(shù)據(jù)庫。今天主要介紹一些DBA不可不知的操作系統(tǒng)內(nèi)核參數(shù),僅供參考,只針對數(shù)據(jù)庫方面。
這里以512GB 內(nèi)存為例
1.參數(shù)
fs.aio-max-nr
支持系統(tǒng):CentOS 6, 7
參數(shù)解釋
推薦設置
fs.aio-max-nr = 1xxxxxx . PostgreSQL, Greenplum 均未使用io_setup創(chuàng)建aio contexts. 無需設置。 如果Oracle數(shù)據(jù)庫,要使用aio的話,需要設置它。 設置它也沒什么壞處,如果將來需要適應異步IO,可以不需要重新修改這個設置。
2.參數(shù)
fs.file-max
支持系統(tǒng):CentOS 6, 7
參數(shù)解釋
推薦設置
fs.file-max = 7xxxxxxx . PostgreSQL 有一套自己管理的VFS,真正打開的FD與內(nèi)核管理的文件打開關閉有一套映射的機制,所以真實情況不需要使用那么多的file handlers。 max_files_per_process 參數(shù)。 假設1GB內(nèi)存支撐100個連接,每個連接打開1000個文件,那么一個PG實例需要打開10萬個文件,一臺機器按512G內(nèi)存來算可以跑500個PG實例,則需要5000萬個file handler。 以上設置綽綽有余。
3.參數(shù)
kernel.core_pattern
支持系統(tǒng):CentOS 6, 7
參數(shù)解釋
推薦設置
kernel.core_pattern = /xxx/core_%e_%u_%t_%s.%p . 這個目錄要777的權限,如果它是個軟鏈,則真實目錄需要777的權限 mkdir /xxx chmod 777 /xxx 注意留足夠的空間
4.參數(shù)
kernel.sem
支持系統(tǒng):CentOS 6, 7
參數(shù)解釋
kernel.sem = 4096 2147483647 2147483646 512000 . 4096 每組多少信號量 (>=17, PostgreSQL 每16個進程一組, 每組需要17個信號量) , 2147483647 總共多少信號量 (2^31-1 , 且大于4096*512000 ) , 2147483646 每個semop()調(diào)用支持多少操作 (2^31-1), 512000 多少組信號量 (假設每GB支持100個連接, 512GB支持51200個連接, 加上其他進程, > 51200*2/16 綽綽有余) . # sysctl -w kernel.sem="4096 2147483647 2147483646 512000" . # ipcs -s -l ------ Semaphore Limits -------- max number of arrays = 512000 max semaphores per array = 4096 max semaphores system wide = 2147483647 max ops per semop call = 2147483646 semaphore max value = 32767
推薦設置
kernel.sem = 4096 2147483647 2147483646 512000 . 4096可能能夠適合更多的場景, 所以大點無妨,關鍵是512000 arrays也夠了。
5.參數(shù)
kernel.shmall = 107374182 kernel.shmmax = 274877906944 kernel.shmmni = 819200
支持系統(tǒng):CentOS 6, 7
參數(shù)解釋
假設主機內(nèi)存 512GB . shmmax 單個共享內(nèi)存段最大 256GB (主機內(nèi)存的一半,單位字節(jié)) shmall 所有共享內(nèi)存段加起來最大 (主機內(nèi)存的80%,單位PAGE) shmmni 一共允許創(chuàng)建819200個共享內(nèi)存段 (每個數(shù)據(jù)庫啟動需要2個共享內(nèi)存段。 將來允許動態(tài)創(chuàng)建共享內(nèi)存段,可能需求量更大) . # getconf PAGE_SIZE 4096
推薦設置
kernel.shmall = 107374182 kernel.shmmax = 274877906944 kernel.shmmni = 819200 . 9.2以及以前的版本,數(shù)據(jù)庫啟動時,對共享內(nèi)存段的內(nèi)存需求非常大,需要考慮以下幾點 Connections: (1800 + 270 * max_locks_per_transaction) * max_connections Autovacuum workers: (1800 + 270 * max_locks_per_transaction) * autovacuum_max_workers Prepared transactions: (770 + 270 * max_locks_per_transaction) * max_prepared_transactions Shared disk buffers: (block_size + 208) * shared_buffers WAL buffers: (wal_block_size + 8) * wal_buffers Fixed space requirements: 770 kB . 以上建議參數(shù)根據(jù)9.2以前的版本設置,后期的版本同樣適用。
6.參數(shù)
net.core.netdev_max_backlog
支持系統(tǒng):CentOS 6, 7
參數(shù)解釋
netdev_max_backlog ------------------ Maximum number of packets, queued on the INPUT side, when the interface receives packets faster than kernel can process them.
推薦設置
net.core.netdev_max_backlog=1xxxx . INPUT鏈表越長,處理耗費越大,如果用了iptables管理的話,需要加大這個值。
7.參數(shù)
net.core.rmem_default net.core.rmem_max net.core.wmem_default net.core.wmem_max
支持系統(tǒng):CentOS 6, 7
參數(shù)解釋
rmem_default ------------ The default setting of the socket receive buffer in bytes. . rmem_max -------- The maximum receive socket buffer size in bytes. . wmem_default ------------ The default setting (in bytes) of the socket send buffer. . wmem_max -------- The maximum send socket buffer size in bytes.
推薦設置
net.core.rmem_default = 262144 net.core.rmem_max = 4194304 net.core.wmem_default = 262144 net.core.wmem_max = 4194304
8.參數(shù)
net.core.somaxconn
支持系統(tǒng):CentOS 6, 7
參數(shù)解釋
somaxconn - INTEGER Limit of socket listen() backlog, known in userspace as SOMAXCONN. Defaults to 128. See also tcp_max_syn_backlog for additional tuning for TCP sockets.
推薦設置
net.core.somaxconn=4xxx
9.參數(shù)
net.ipv4.tcp_max_syn_backlog
支持系統(tǒng):CentOS 6, 7
參數(shù)解釋
tcp_max_syn_backlog - INTEGER Maximal number of remembered connection requests, which have not received an acknowledgment from connecting client. The minimal value is 128 for low memory machines, and it will increase in proportion to the memory of machine. If server suffers from overload, try increasing this number.
推薦設置
net.ipv4.tcp_max_syn_backlog=4xxx pgpool-II 使用了這個值,用于將超過num_init_child以外的連接queue。 所以這個值決定了有多少連接可以在隊列里面等待。
10.參數(shù)
net.ipv4.tcp_keepalive_intvl=20 net.ipv4.tcp_keepalive_probes=3 net.ipv4.tcp_keepalive_time=60
支持系統(tǒng):CentOS 6, 7
參數(shù)解釋
tcp_keepalive_time - INTEGER How often TCP sends out keepalive messages when keepalive is enabled. Default: 2hours. . tcp_keepalive_probes - INTEGER How many keepalive probes TCP sends out, until it decides that the connection is broken. Default value: 9. . tcp_keepalive_intvl - INTEGER How frequently the probes are send out. Multiplied by tcp_keepalive_probes it is time to kill not responding connection, after probes started. Default value: 75sec i.e. connection will be aborted after ~11 minutes of retries.
推薦設置
net.ipv4.tcp_keepalive_intvl=20 net.ipv4.tcp_keepalive_probes=3 net.ipv4.tcp_keepalive_time=60 . 連接空閑60秒后, 每隔20秒發(fā)心跳包, 嘗試3次心跳包沒有響應,關閉連接。 從開始空閑,到關閉連接總共歷時120秒。
11.參數(shù)
net.ipv4.tcp_mem=8388608 12582912 16777216
支持系統(tǒng):CentOS 6, 7
參數(shù)解釋
tcp_mem - vector of 3 INTEGERs: min, pressure, max 單位 page min: below this number of pages TCP is not bothered about its memory appetite. . pressure: when amount of memory allocated by TCP exceeds this number of pages, TCP moderates its memory consumption and enters memory pressure mode, which is exited when memory consumption falls under "min". . max: number of pages allowed for queueing by all TCP sockets. . Defaults are calculated at boot time from amount of available memory. 64GB 內(nèi)存,自動計算的值是這樣的 net.ipv4.tcp_mem = 1539615 2052821 3079230 . 512GB 內(nèi)存,自動計算得到的值是這樣的 net.ipv4.tcp_mem = 49621632 66162176 99243264 . 這個參數(shù)讓操作系統(tǒng)啟動時自動計算,問題也不大
推薦設置
net.ipv4.tcp_mem=8388608 12582912 16777216 . 這個參數(shù)讓操作系統(tǒng)啟動時自動計算,問題也不大
12.參數(shù)
net.ipv4.tcp_fin_timeout
支持系統(tǒng):CentOS 6, 7
參數(shù)解釋
tcp_fin_timeout - INTEGER The length of time an orphaned (no longer referenced by any application) connection will remain in the FIN_WAIT_2 state before it is aborted at the local end. While a perfectly valid "receive only" state for an un-orphaned connection, an orphaned connection in FIN_WAIT_2 state could otherwise wait forever for the remote to close its end of the connection. Cf. tcp_max_orphans Default: 60 seconds
推薦設置
net.ipv4.tcp_fin_timeout=5 . 加快僵尸連接回收速度
13.參數(shù)
net.ipv4.tcp_synack_retries
支持系統(tǒng):CentOS 6, 7
參數(shù)解釋
tcp_synack_retries - INTEGER Number of times SYNACKs for a passive TCP connection attempt will be retransmitted. Should not be higher than 255. Default value is 5, which corresponds to 31seconds till the last retransmission with the current initial RTO of 1second. With this the final timeout for a passive TCP connection will happen after 63seconds.
推薦設置
net.ipv4.tcp_synack_retries=2 . 縮短tcp syncack超時時間
14.參數(shù)
net.ipv4.tcp_syncookies
支持系統(tǒng):CentOS 6, 7
參數(shù)解釋
tcp_syncookies - BOOLEAN Only valid when the kernel was compiled with CONFIG_SYN_COOKIES Send out syncookies when the syn backlog queue of a socket overflows. This is to prevent against the common 'SYN flood attack' Default: 1 . Note, that syncookies is fallback facility. It MUST NOT be used to help highly loaded servers to stand against legal connection rate. If you see SYN flood warnings in your logs, but investigation shows that they occur because of overload with legal connections, you should tune another parameters until this warning disappear. See: tcp_max_syn_backlog, tcp_synack_retries, tcp_abort_on_overflow. . syncookies seriously violate TCP protocol, do not allow to use TCP extensions, can result in serious degradation of some services (f.e. SMTP relaying), visible not by you, but your clients and relays, contacting you. While you see SYN flood warnings in logs not being really flooded, your server is seriously misconfigured. . If you want to test which effects syncookies have to your network connections you can set this knob to 2 to enable unconditionally generation of syncookies.
推薦設置
net.ipv4.tcp_syncookies=1 . 防止syn flood攻擊
15.參數(shù)
net.ipv4.tcp_timestamps
支持系統(tǒng):CentOS 6, 7
參數(shù)解釋
tcp_timestamps - BOOLEAN Enable timestamps as defined in RFC1323.
推薦設置
net.ipv4.tcp_timestamps=1 . tcp_timestamps 是 tcp 協(xié)議中的一個擴展項,通過時間戳的方式來檢測過來的包以防止 PAWS(Protect Against Wrapped Sequence numbers),可以提高 tcp 的性能。
16.參數(shù)
net.ipv4.tcp_tw_recycle net.ipv4.tcp_tw_reuse net.ipv4.tcp_max_tw_buckets
支持系統(tǒng):CentOS 6, 7
參數(shù)解釋
tcp_tw_recycle - BOOLEAN Enable fast recycling TIME-WAIT sockets. Default value is 0. It should not be changed without advice/request of technical experts. . tcp_tw_reuse - BOOLEAN Allow to reuse TIME-WAIT sockets for new connections when it is safe from protocol viewpoint. Default value is 0. It should not be changed without advice/request of technical experts. . tcp_max_tw_buckets - INTEGER Maximal number of timewait sockets held by system simultaneously. If this number is exceeded time-wait socket is immediately destroyed and warning is printed. This limit exists only to prevent simple DoS attacks, you _must_ not lower the limit artificially, but rather increase it (probably, after increasing installed memory), if network conditions require more than default value.
推薦設置
net.ipv4.tcp_tw_recycle=0 net.ipv4.tcp_tw_reuse=1 net.ipv4.tcp_max_tw_buckets = 2xxxxx . net.ipv4.tcp_tw_recycle和net.ipv4.tcp_timestamps不建議同時開啟
17.參數(shù)
net.ipv4.tcp_rmem net.ipv4.tcp_wmem
支持系統(tǒng):CentOS 6, 7
參數(shù)解釋
tcp_wmem - vector of 3 INTEGERs: min, default, max min: Amount of memory reserved for send buffers for TCP sockets. Each TCP socket has rights to use it due to fact of its birth. Default: 1 page . default: initial size of send buffer used by TCP sockets. This value overrides net.core.wmem_default used by other protocols. It is usually lower than net.core.wmem_default. Default: 16K . max: Maximal amount of memory allowed for automatically tuned send buffers for TCP sockets. This value does not override net.core.wmem_max. Calling setsockopt() with SO_SNDBUF disables automatic tuning of that socket's send buffer size, in which case this value is ignored. Default: between 64K and 4MB, depending on RAM size. . tcp_rmem - vector of 3 INTEGERs: min, default, max min: Minimal size of receive buffer used by TCP sockets. It is guaranteed to each TCP socket, even under moderate memory pressure. Default: 1 page . default: initial size of receive buffer used by TCP sockets. This value overrides net.core.rmem_default used by other protocols. Default: 87380 bytes. This value results in window of 65535 with default setting of tcp_adv_win_scale and tcp_app_win:0 and a bit less for default tcp_app_win. See below about these variables. . max: maximal size of receive buffer allowed for automatically selected receiver buffers for TCP socket. This value does not override net.core.rmem_max. Calling setsockopt() with SO_RCVBUF disables automatic tuning of that socket's receive buffer size, in which case this value is ignored. Default: between 87380B and 6MB, depending on RAM size.
推薦設置
net.ipv4.tcp_rmem=8192 87380 16777216 net.ipv4.tcp_wmem=8192 65536 16777216 . 許多數(shù)據(jù)庫的推薦設置,提高網(wǎng)絡性能
18.參數(shù)
net.nf_conntrack_max net.netfilter.nf_conntrack_max
支持系統(tǒng):CentOS 6
參數(shù)解釋
nf_conntrack_max - INTEGER Size of connection tracking table. Default value is nf_conntrack_buckets value * 4.
推薦設置
net.nf_conntrack_max=1xxxxxx net.netfilter.nf_conntrack_max=1xxxxxx
19.參數(shù)
vm.dirty_background_bytes vm.dirty_expire_centisecs vm.dirty_ratio vm.dirty_writeback_centisecs
支持系統(tǒng):CentOS 6, 7
參數(shù)解釋
===================================================== . dirty_background_bytes . Contains the amount of dirty memory at which the background kernel flusher threads will start writeback. . Note: dirty_background_bytes is the counterpart of dirty_background_ratio. Only one of them may be specified at a time. When one sysctl is written it is immediately taken into account to evaluate the dirty memory limits and the other appears as 0 when read. . ===================================================== . dirty_background_ratio . Contains, as a percentage of total system memory, the number of pages at which the background kernel flusher threads will start writing out dirty data. . ===================================================== . dirty_bytes . Contains the amount of dirty memory at which a process generating disk writes will itself start writeback. . Note: dirty_bytes is the counterpart of dirty_ratio. Only one of them may be specified at a time. When one sysctl is written it is immediately taken into account to evaluate the dirty memory limits and the other appears as 0 when read. . Note: the minimum value allowed for dirty_bytes is two pages (in bytes); any value lower than this limit will be ignored and the old configuration will be retained. . ===================================================== . dirty_expire_centisecs . This tunable is used to define when dirty data is old enough to be eligible for writeout by the kernel flusher threads. It is expressed in 100'ths of a second. Data which has been dirty in-memory for longer than this interval will be written out next time a flusher thread wakes up. . ===================================================== . dirty_ratio . Contains, as a percentage of total system memory, the number of pages at which a process which is generating disk writes will itself start writing out dirty data. . ===================================================== . dirty_writeback_centisecs . The kernel flusher threads will periodically wake up and write `old' data out to disk. This tunable expresses the interval between those wakeups, in 100'ths of a second. . Setting this to zero disables periodic writeback altogether.
推薦設置
vm.dirty_background_bytes = 4096000000 vm.dirty_expire_centisecs = 6000 vm.dirty_ratio = 80 vm.dirty_writeback_centisecs = 50 . 減少數(shù)據(jù)庫進程刷臟頁的頻率,dirty_background_bytes根據(jù)實際IOPS能力以及內(nèi)存大小設置
20.參數(shù)
vm.extra_free_kbytes
支持系統(tǒng):CentOS 6
參數(shù)解釋
extra_free_kbytes . This parameter tells the VM to keep extra free memory between the threshold where background reclaim (kswapd) kicks in, and the threshold where direct reclaim (by allocating processes) kicks in. . This is useful for workloads that require low latency memory allocations and have a bounded burstiness in memory allocations, for example a realtime application that receives and transmits network traffic (causing in-kernel memory allocations) with a maximum total message burst size of 200MB may need 200MB of extra free memory to avoid direct reclaim related latencies. . 目標是盡量讓后臺進程回收內(nèi)存,比用戶進程提早多少kbytes回收,因此用戶進程可以快速分配內(nèi)存。
推薦設置
vm.extra_free_kbytes=4xxxxxx
21.參數(shù)
vm.min_free_kbytes
支持系統(tǒng):CentOS 6, 7
參數(shù)解釋
min_free_kbytes: . This is used to force the Linux VM to keep a minimum number of kilobytes free. The VM uses this number to compute a watermark[WMARK_MIN] value for each lowmem zone in the system. Each lowmem zone gets a number of reserved free pages based proportionally on its size. . Some minimal amount of memory is needed to satisfy PF_MEMALLOC allocations; if you set this to lower than 1024KB, your system will become subtly broken, and prone to deadlock under high loads. . Setting this too high will OOM your machine instantly.
推薦設置
vm.min_free_kbytes = 2xxxxxx # vm.min_free_kbytes 建議每32G內(nèi)存分配1G vm.min_free_kbytes . 防止在高負載時系統(tǒng)無響應,減少內(nèi)存分配死鎖概率。
22.參數(shù)
vm.mmap_min_addr
支持系統(tǒng):CentOS 6, 7
參數(shù)解釋
mmap_min_addr . This file indicates the amount of address space which a user process will be restricted from mmapping. Since kernel null dereference bugs could accidentally operate based on the information in the first couple of pages of memory userspace processes should not be allowed to write to them. By default this value is set to 0 and no protections will be enforced by the security module. Setting this value to something like 64k will allow the vast majority of applications to work correctly and provide defense in depth against future potential kernel bugs.
推薦設置
vm.mmap_min_addr=6xxxx . 防止內(nèi)核隱藏的BUG導致的問題
23.參數(shù)
vm.overcommit_memory vm.overcommit_ratio
支持系統(tǒng):CentOS 6, 7
參數(shù)解釋
==================================================== . overcommit_kbytes: . When overcommit_memory is set to 2, the committed address space is not permitted to exceed swap plus this amount of physical RAM. See below. . Note: overcommit_kbytes is the counterpart of overcommit_ratio. Only one of them may be specified at a time. Setting one disables the other (which then appears as 0 when read). . ==================================================== . overcommit_memory: . This value contains a flag that enables memory overcommitment. . When this flag is 0, the kernel attempts to estimate the amount of free memory left when userspace requests more memory. . When this flag is 1, the kernel pretends there is always enough memory until it actually runs out. . When this flag is 2, the kernel uses a "never overcommit" policy that attempts to prevent any overcommit of memory. Note that user_reserve_kbytes affects this policy. . This feature can be very useful because there are a lot of programs that malloc() huge amounts of memory "just-in-case" and don't use much of it. . The default value is 0. . See Documentation/vm/overcommit-accounting and security/commoncap.c::cap_vm_enough_memory() for more information. . ===================================================== . overcommit_ratio: . When overcommit_memory is set to 2, the committed address space is not permitted to exceed swap + this percentage of physical RAM. See above.
推薦設置
vm.overcommit_memory = 0 vm.overcommit_ratio = 90 . vm.overcommit_memory = 0 時 vm.overcommit_ratio可以不設置
24.參數(shù)
vm.swappiness
支持系統(tǒng):CentOS 6, 7
參數(shù)解釋
swappiness . This control is used to define how aggressive the kernel will swap memory pages. Higher values will increase agressiveness, lower values decrease the amount of swap. . The default value is 60.
推薦設置
vm.swappiness = 0
25.參數(shù)
vm.zone_reclaim_mode
支持系統(tǒng):CentOS 6, 7
參數(shù)解釋
zone_reclaim_mode: . Zone_reclaim_mode allows someone to set more or less aggressive approaches to reclaim memory when a zone runs out of memory. If it is set to zero then no zone reclaim occurs. Allocations will be satisfied from other zones / nodes in the system. . This is value ORed together of . 1 = Zone reclaim on 2 = Zone reclaim writes dirty pages out 4 = Zone reclaim swaps pages . zone_reclaim_mode is disabled by default. For file servers or workloads that benefit from having their data cached, zone_reclaim_mode should be left disabled as the caching effect is likely to be more important than data locality. . zone_reclaim may be enabled if it's known that the workload is partitioned such that each partition fits within a NUMA node and that accessing remote memory would cause a measurable performance reduction. The page allocator will then reclaim easily reusable pages (those page cache pages that are currently not used) before allocating off node pages. . Allowing zone reclaim to write out pages stops processes that are writing large amounts of data from dirtying pages on other nodes. Zone reclaim will write out dirty pages if a zone fills up and so effectively throttle the process. This may decrease the performance of a single process since it cannot use all of system memory to buffer the outgoing writes anymore but it preserve the memory on other nodes so that the performance of other processes running on other nodes will not be affected. . Allowing regular swap effectively restricts allocations to the local node unless explicitly overridden by memory policies or cpuset configurations.
推薦設置
vm.zone_reclaim_mode=0 . 不使用NUMA
26.參數(shù)
net.ipv4.ip_local_port_range
支持系統(tǒng):CentOS 6, 7
參數(shù)解釋
ip_local_port_range - 2 INTEGERS Defines the local port range that is used by TCP and UDP to choose the local port. The first number is the first, the second the last local port number. The default values are 32768 and 61000 respectively. . ip_local_reserved_ports - list of comma separated ranges Specify the ports which are reserved for known third-party applications. These ports will not be used by automatic port assignments (e.g. when calling connect() or bind() with port number 0). Explicit port allocation behavior is unchanged. . The format used for both input and output is a comma separated list of ranges (e.g. "1,2-4,10-10" for ports 1, 2, 3, 4 and 10). Writing to the file will clear all previously reserved ports and update the current list with the one given in the input. . Note that ip_local_port_range and ip_local_reserved_ports settings are independent and both are considered by the kernel when determining which ports are available for automatic port assignments. . You can reserve ports which are not in the current ip_local_port_range, e.g.: . $ cat /proc/sys/net/ipv4/ip_local_port_range 32000 61000 $ cat /proc/sys/net/ipv4/ip_local_reserved_ports 8080,9148 . although this is redundant. However such a setting is useful if later the port range is changed to a value that will include the reserved ports. . Default: Empty
推薦設置
net.ipv4.ip_local_port_range=40000 65535 . 限制本地動態(tài)端口分配范圍,防止占用監(jiān)聽端口。
27.參數(shù)
vm.nr_hugepages
支持系統(tǒng):CentOS 6, 7
參數(shù)解釋
===================================================== nr_hugepages Change the minimum size of the hugepage pool. See Documentation/vm/hugetlbpage.txt ===================================================== nr_overcommit_hugepages Change the maximum size of the hugepage pool. The maximum is nr_hugepages + nr_overcommit_hugepages. See Documentation/vm/hugetlbpage.txt . The output of "cat /proc/meminfo" will include lines like: ...... HugePages_Total: vvv HugePages_Free: www HugePages_Rsvd: xxx HugePages_Surp: yyy Hugepagesize: zzz kB . where: HugePages_Total is the size of the pool of huge pages. HugePages_Free is the number of huge pages in the pool that are not yet allocated. HugePages_Rsvd is short for "reserved," and is the number of huge pages for which a commitment to allocate from the pool has been made, but no allocation has yet been made. Reserved huge pages guarantee that an application will be able to allocate a huge page from the pool of huge pages at fault time. HugePages_Surp is short for "surplus," and is the number of huge pages in the pool above the value in /proc/sys/vm/nr_hugepages. The maximum number of surplus huge pages is controlled by /proc/sys/vm/nr_overcommit_hugepages. . /proc/filesystems should also show a filesystem of type "hugetlbfs" configured in the kernel. . /proc/sys/vm/nr_hugepages indicates the current number of "persistent" huge pages in the kernel's huge page pool. "Persistent" huge pages will be returned to the huge page pool when freed by a task. A user with root privileges can dynamically allocate more or free some persistent huge pages by increasing or decreasing the value of 'nr_hugepages'.
推薦設置
如果要使用PostgreSQL的huge page,建議設置它。 大于數(shù)據(jù)庫需要的共享內(nèi)存即可。
28.參數(shù)
fs.nr_open
支持系統(tǒng):CentOS 6, 7
參數(shù)解釋
nr_open: This denotes the maximum number of file-handles a process can allocate. Default value is 1024*1024 (1048576) which should be enough for most machines. Actual limit depends on RLIMIT_NOFILE resource limit. 它還影響security/limits.conf 的文件句柄限制,單個進程的打開句柄不能大于fs.nr_open,所以要加大文件句柄限制,首先要加大nr_open
推薦設置
對于有很多對象(表、視圖、索引、序列、物化視圖等)的PostgreSQL數(shù)據(jù)庫,建議設置為2000萬, 例如fs.nr_open=20480000
1. 通過/etc/security/limits.conf設置,或者ulimit設置
2. 通過/proc/$pid/limits查看當前進程的設置
# - core - limits the core file size (KB) # - memlock - max locked-in-memory address space (KB) # - nofile - max number of open files 建議設置為1000萬 , 但是必須設置sysctl, fs.nr_open大于它,否則會導致系統(tǒng)無法登陸。 # - nproc - max number of processes 以上四個是非常關心的配置 .... # - data - max data size (KB) # - fsize - maximum filesize (KB) # - rss - max resident set size (KB) # - stack - max stack size (KB) # - cpu - max CPU time (MIN) # - as - address space limit (KB) # - maxlogins - max number of logins for this user # - maxsyslogins - max number of logins on the system # - priority - the priority to run user process with # - locks - max number of file locks the user can hold # - sigpending - max number of pending signals # - msgqueue - max memory used by POSIX message queues (bytes) # - nice - max nice priority allowed to raise to values: [-20, 19] # - rtprio - max realtime priority
1. 目前操作系統(tǒng)支持的IO調(diào)度策略包括cfq, deadline, noop 等。
從這里可以看到它的調(diào)度策略
cat /sys/block/磁盤/queue/scheduler
修改
echo deadline > /sys/block/sda/queue/scheduler
或者修改啟動參數(shù)
grub.conf elevator=deadline
從很多測試結果來看,數(shù)據(jù)庫使用deadline調(diào)度,性能會更穩(wěn)定一些。
其實還有一些參數(shù),例如關閉透明大頁、禁用NUMA、SSD的對齊等等,篇幅有限,就介紹到這里了,按上面設置基本是足夠了。后面會分享更多devops和DBA方面的內(nèi)容,感興趣的朋友可以關注一下~
幾天前,惠普宣布不再支持Windows手機時就透出口風,稱微軟已經(jīng)改變了戰(zhàn)略,不再專注移動版Windows。
而就在昨天,微軟正式確認Windows 10 Mobile進入“維護期”,所謂維護期基本就是雪藏了。
Windows 10 Mobile主管Joe Belfiore(人稱喬北峰)通過推特表示,微軟未來只會為Windows 10 Mobile提供包括修復BUG、更新安全補丁等基本服務,不再開發(fā)新的系統(tǒng)功能和硬件產(chǎn)品。
Windows 10 Mobile
這意味著,從WP7時代起就一直半死不活的微軟移動夢終于徹底夢碎,微軟無疑在這一次的移動大戰(zhàn)中敗得非常徹底。
對于微軟的這次大潰敗,人們一致認為主因不是兇險的市場環(huán)境,而是更多來自微軟內(nèi)部的決策失誤。那么微軟到底是怎樣一步步把這個市場玩死的呢?
就在宣布雪藏Windows 10 Mobile時,喬北峰給出了他們的解釋:“我們做了艱苦的努力,鼓勵第三方軟件開發(fā)者,支付了費用,甚至為他們開發(fā)軟件產(chǎn)品,但是用戶規(guī)模太少了,大部分軟件開發(fā)企業(yè)不愿意在Windows10移動版上投資。”
但我們知道這不是全部,是什么導致了Windows移動產(chǎn)品陷入了沒有用戶、沒有開發(fā)者的惡性循環(huán)?這恐怕還是微軟自己的錯。
作死行為之一:趕走用戶
首先就是對裝機量的兩次嚴重打擊。我們知道一個平臺想要發(fā)展起來,裝機量是基礎,也就是微軟說的,有了用戶才會吸引開發(fā)者為平臺開發(fā)軟件產(chǎn)品。但是微軟的兩個換代拋棄前代用戶的行為讓用戶信心備受打擊。
第一次是Windows Phone 7手機無法升級到Windows Phone 8系統(tǒng),第二次是承諾的WP8手機升級Win10 Mobile也沒有做到。
原本就不多的WP用戶被徹底掃地出門,而且造成的影響極壞。
盡管微軟總是用硬件條件不夠來當做理由,但用戶并不會給WP第二次機會,既然不然用,那就轉去iOS或者Android這倆更成熟的平臺不是更好?
而在裝機量的問題上,微軟還有一個問題就是不肯在硬件上讓利來擴大Windows手機的銷量。
我們從所有Windows手機上可以看出,這些手機基本沒有什么性價比可言,入門級的低端機型價格也要高出Android機型,這種銷售策略基本不要想吸引太多消費者,更何況在系統(tǒng)平臺還要落后于別家的情況下。
Windows手機的性價比過低
中國有句話叫做“舍不得孩子套不著狼”,這個道理其實微軟也懂,當初Xbox 360不就是靠著初期虧本賣機器打開市場的嗎?
而且平臺建立起來后,前期的投入全都收回來了,這才有了Xbox 360整個時代的輝煌。但是到了Windows Phone這個問題上,微軟卻犯了傻。
事實上也不只是Windows Phone,這代Xbox One的潰敗也大體是這個原因,強制捆綁Kinect的499美元銷售策略直接嚇退了眾多玩家。但是微軟并沒有吸取教訓,在同一個地方摔倒兩次就不是任何理由能夠搪塞的了。
可以說,在平臺建立初期這種最需要用戶基數(shù)的時候,微軟通過一系列略策把用戶往外趕,為了那些短期的利潤率失掉了最好的用戶積累時機,只能說是撿了芝麻丟了西瓜,最后潰敗可以說都是“吝嗇”所致。
作死行為之二:趕走開發(fā)者
其次,微軟對應用開發(fā)者真的入喬北峰所說的那么“努力”嗎?事實上我認識的不少開發(fā)者都跟我吐槽過Windows Phone平臺開發(fā)有多麻煩,光是通過Windows Store審核就很玄學。
而且在這個平臺上,微軟依然沒有給予開發(fā)者足夠的收入照顧。
當初我在采訪微軟相關人員時就提問過,Windows Store是否會給應用開發(fā)者高于業(yè)界慣例的分成以吸引更多開發(fā)者加入,但微軟表示并沒有,依然是業(yè)界常規(guī)的分成比例。
到這個時候的微軟仍然不愿意讓利,還要去摳那點利潤率,真的是無法理喻。
Windows Store的開發(fā)環(huán)境并非那么好
要知道開發(fā)者并不是非要在這個平臺開發(fā)不可,iOS和Android的收入就很不錯了,WP在用戶基數(shù)和收入上都沒有優(yōu)勢,還拿什么吸引開發(fā)者分出更多時間去投入?
綜上所述,微軟在最需要用戶和開發(fā)者支持時,為了太多短期收益硬生生趕走了自己的衣食父母,作為一個后來者全然不顧實際形勢,依然擺出一副傲慢的姿態(tài),不會讓利不會變通,最終自己把自己困死在牢籠里。
微軟的這次潰敗將會給業(yè)界一個很好的警示,當然我也相信,中國的企業(yè)是基本不會犯微軟這種錯誤的,“免費游戲”是中國企業(yè)家們玩得最溜的一種手段了。所以從一個中國科技媒體的編輯角度來看,微軟這次真的是“笨”死的。