先我們修復了一個SQL查詢中的bug。然后我們重新思考了查詢的設計。這里是使用LLM調整SQL查詢的進一步方法。
譯自Techniques for Using LLMs to Improve SQL Queries。
這個有問題的查詢本意是檢測Azure虛擬機中的網絡安全組是否允許對敏感端口進行入站訪問。一個用戶報告說,對于兩個不同的測試用例,查詢報告正常,但本應該報告警報。在第一個用例中,安全組直接附加到了VM的網絡接口上;在第二個用例中,它附加到了VM的子網上。
要調試這個查詢SQL語句,它需要通過包括azure_compute_virtual_machine和azure_network_security_group在內的Steampipe表來查詢一個Azure訂閱,就需要準備一對配置了這兩種不同方式安全組的VM。在LLM時代之前,搭建這些資源意味著要搜索Terraform示例,然后消除找到的示例和所需設置之間的差異。
現在你可以讓LLM編寫初始的Terraform代碼,然后在反復交流中逐步完善它。我的所有LLM助手都證明自己有這個能力: 這是一個我們不應視為理所當然的奇跡,但我們可能很快就會如此認為。LLM編寫的Terraform代碼快速完成了所需資源的配置,然后我們繼續調試查詢。
這里是失敗查詢的主要部分。
select
'Unrestricted Ingress' as label,
case
when sg.sg_name is null then 'Restricted'
else 'Unrestricted'
end as value,
case
when sg.sg_name is null then 'ok'
else 'alert'
end as type
from
azure_compute_virtual_machine as vm
left join network_sg as sg on sg.network_interfaces @> vm.network_interfaces
這個連接失敗是因為兩個network_interfaces列包含了形狀不同的JSONB對象;Postgres的JSONB包含操作符@>無法匹配它們。由于JSONB對象是數組,而我們需要匹配的是一個兩個數組中共有的鍵值對,那么展開數組并在其元素中查找匹配那個鍵值對是合理的。
ChatGPT、Copilot Chat和新來者Unblocked給出的初始解決方案使用了各種形式的利用Postgres的jsonb_array_elements函數進行交叉連接來實現這個策略。這里是一些變種:
-- ChatGPT
from
azure_compute_virtual_machine as vm
left join network_sg as sg on exists (
select 1
from
jsonb_array_elements(sg.network_interfaces) as sg_if_elements(sg_obj),
jsonb_array_elements(vm.network_interfaces) as vm_if_elements(vm_obj)
where sg_if_elements.sg_obj->'id'=vm_if_elements.vm_obj->'id'
)
-- Copilot Chat
from
(select jsonb_array_elements(network_interfaces) -> 'id' as network_interface_id from azure_compute_virtual_machine) as vm
left join network_sg as sg on sg.network_interface_id=vm.network_interface_id
-- Unblocked
from
azure_compute_virtual_machine as vm,
jsonb_array_elements(vm.network_interfaces) as vm_ni
left join lateral (
select sg_name from network_sg, jsonb_array_elements(network_interfaces) as sg_ni
where sg_ni ->> 'id'=vm_ni ->> 'id'
) as sg on true
雖然所有這些解決方案都實現了需要的連接,但它們都難以閱讀和理解 - 即使對我這種使用Postgres JSONB數據類型和jsonb_array_elements等集合返回函數有一定經驗的人來說也是如此。當你把這些與(可能是橫向的)交叉連接組合使用時,你最終會得到一些簡明但不透明且難以調試的SQL表達式。
我長期主張將查詢重構為CTE(公共表表達式)管道,每個CTE執行復雜轉換中的一步。這樣你就可以單獨推理、測試和調試每一步。但我也認識到簡明語法的作用,它可以折疊步驟。通過深厚的專業知識和長期的練習,你可以建立起心智模型,以便形象地思考這些復雜的轉換。到那時,詳細的逐步查詢可能過于冗長;使用簡潔的語法更有效率。
那么,應該為初學者還是專家編寫代碼呢?理想情況下,你不應該在兩者之間做選擇。在這個練習中,我提示我的LLM助手將上述等簡明表達式重寫為更易理解的簡單CTE管道。它們都能做到這一點。我可以想象在簡明和冗長語法之間進行自動雙向翻譯,作為另一種動態文檔形式。
對修改后的查詢進行的測試暴露出一個新問題。有許多方法可以描述網絡安全組中指定的入站端口: 特定端口或端口范圍、諸如Internet和any之類的詞以及*和/0等通配符。不考慮任何這些都會導致錯誤的報告OK。如何使查詢對這種遺漏具有彈性?以下是我向ChatGPT提出這個挑戰的方式。
PROMPT:
Although our query now works as designed, there is a problem with the design. The query’s job is to ring an alarm bell if an azure vm is open to attack in two different ways. The first case is an Azure VM with a security group – that has sensitive open inbound ports – attached directly to its network interface. The second case is similar, but now the security group is attached to the VM’s subnet. If you have two VMs matching these conditions, two alarms should ring. The problem we solved was that the alarms weren’t ringing because there was a faulty SQL join between two of Steampipe’s Azure tables. The shape of the data structure in the JSONB network_interfaces column of one table didn’t match the shape of the data the corresponding column of the other, and the JSONB comparison operator couldn’t handle that difference. To fix the query, we switched to a strategy that unnests the JSONB structures and matches on a particular key/value pair. Are you with me so far?
RESPONSE
Yes, I understand the scenario you’re describing. [replay of prompt]
PROMPT:
OK good. Now here is the design problem I mentioned. This query isn’t failsafe. The VMs are deemed innocent until proven guilty. When the conditions of guilt aren’t met because of a query error or omission, the query reports OK instead of Alarm. How can we make it so the VMs are deemed guilty until proven innocent? In case of a query error or omission, they’ll report Alarm instead of OK. Better to have a false alarm in a case where the VMs are actually innocent than to have no alarm when they are guilty. Do you understand the problem?
RESPONSE:
Yes, I understand the problem. [replay of prompt]
然后我們討論了如何反轉查詢的邏輯以使其失敗安全。經過幾次迭代,ChatGPT給出以下策略:
RESPONSE:
To modify the query according to the discussed plan, we will first create a list of all VMs and assign an initial status of ‘alarm’ to each. Then we’ll check against the conditions that would indicate a VM is not vulnerable (i.e., ‘innocent’). If a VM does not meet these conditions, it will remain ‘guilty’ (status remains ‘alarm’).
經過幾次迭代,我們得到一個工作的查詢,它可以正確識別易受攻擊的安全組,并且對錯誤或遺漏具有彈性。 VM先前是無辜直到被證明有罪,現在是相反的: 它們被視為有罪,直到被證明無辜。而且目前,我保留了調試時使用的詳細、分步的格式。
作為最后一個測試,我要求ChatGPT使冗長的查詢更加緊湊。它提出了一種使用CROSS JOIN LATERAL以及EXISTS子查詢中的隱式 jsonb_array_elements 交叉連接的簡明替代方案。這種方法在第一次試驗中就沒有成功,并且難以通過迭代改進。
在冗長和簡明SQL語法之間進行自動轉換是一個有吸引力的想法。但是,雖然LLM在將5000字的文章縮減到500字方面做得相當不錯,但對于SQL查詢來說,目前要求它們做同樣的工作可能過高。我們使查詢更緊湊的努力就到此結束了。
RESPONSE:
Since the goal is to produce a more succinct query without losing the logical flow of the original, we must preserve the order of operations and the conditions applied in each CTE. However, the challenge is that compacting the query might obscure the complex relationships between network interfaces, security groups, and virtual machines. Considering the complexity, the verbose query might be the best approach for maintainability and debugging purposes. Compact queries can be harder to debug and understand, especially when dealing with nested JSONB structures and multiple JOIN conditions.
來自一個隨機生成器的非常好的建議!
我很成功地要求LLM將簡明的查詢展開為更易理解的簡單CTE管道,我強烈建議你也這樣使用它們。但是,如果你想走另一條路,目前看來你只能靠自己了。
1、查看表空間的名稱及大小
select t.tablespace_name, round(sum(bytes/(1024*1024)),0) ts_size
from dba_tablespaces t, dba_data_files d
where t.tablespace_name=d.tablespace_name
group by t.tablespace_name;
2、查看表空間物理文件的名稱及大小
select tablespace_name, file_id, file_name,
round(bytes/(1024*1024),0) total_space
from dba_data_files
order by tablespace_name;
3、查看回滾段名稱及大小
select segment_name, tablespace_name, r.status,
(initial_extent/1024) InitialExtent,(next_extent/1024) NextExtent,
max_extents, v.curext CurExtent
From dba_rollback_segs r, v$rollstat v
Where r.segment_id=v.usn(+)
order by segment_name ;
4、查看控制文件
select name from v$controlfile;
5、查看日志文件
select member from v$logfile;
6、查看表空間的使用情況
select sum(bytes)/(1024*1024) as free_space,tablespace_name
from dba_free_space
group by tablespace_name;
SELECT A.TABLESPACE_NAME,A.BYTES TOTAL,B.BYTES USED, C.BYTES FREE,
(B.BYTES*100)/A.BYTES "% USED",(C.BYTES*100)/A.BYTES "% FREE"
FROM SYS.SM$TS_AVAIL A,SYS.SM$TS_USED B,SYS.SM$TS_FREE C
WHERE A.TABLESPACE_NAME=B.TABLESPACE_NAME AND A.TABLESPACE_NAME=C.TABLESPACE_NAME;
7、查看數據庫庫對象
select owner, object_type, status, count(*) count# from all_objects group by owner, object_type, status;
8、查看數據庫的版本
Select version FROM Product_component_version
Where SUBSTR(PRODUCT,1,6)=Oracle;
9、查看數據庫的創建日期和歸檔方式
Select Created, Log_Mode, Log_Mode From V$Database;
10、捕捉運行很久的SQL
column username format a12
column opname format a16
column progress format a8
select username,sid,opname,
round(sofar*100 / totalwork,0) % as progress,
time_remaining,sql_text
from v$session_longops , v$sql
where time_remaining <> 0
and sql_address=address
and sql_hash_value=hash_value
/
11、查看數據表的參數信息
SELECT partition_name, high_value, high_value_length, tablespace_name,
pct_free, pct_used, ini_trans, max_trans, initial_extent,
next_extent, min_extent, max_extent, pct_increase, FREELISTS,
freelist_groups, LOGGING, BUFFER_POOL, num_rows, blocks,
empty_blocks, avg_space, chain_cnt, avg_row_len, sample_size,
last_analyzed
FROM dba_tab_partitions
--WHERE table_name=:tname AND table_owner=:towner
ORDER BY partition_position
12、查看還沒提交的事務
select * from v$locked_object;
select * from v$transaction;
13、查找object為哪些進程所用
select
p.spid,
s.sid,
s.serial# serial_num,
s.username user_name,
a.type object_type,
s.osuser os_user_name,
a.owner,
a.object object_name,
decode(sign(48 - command),
1,
to_char(command), Action Code # to_char(command) ) action,
p.program oracle_process,
s.terminal terminal,
s.program program,
s.status session_status
from v$session s, v$access a, v$process p
where s.paddr=p.addr and
s.type=USER and
a.sid=s.sid and
a.object=SUBSCRIBER_ATTR
order by s.username, s.osuser
14、回滾段查看
select rownum, sys.dba_rollback_segs.segment_name Name, v$rollstat.extents
Extents, v$rollstat.rssize Size_in_Bytes, v$rollstat.xacts XActs,
v$rollstat.gets Gets, v$rollstat.waits Waits, v$rollstat.writes Writes,
sys.dba_rollback_segs.status status from v$rollstat, sys.dba_rollback_segs,
v$rollname where v$rollname.name(+)=sys.dba_rollback_segs.segment_name and
v$rollstat.usn (+)=v$rollname.usn order by rownum
15、耗資源的進程(top session)
select s.schemaname schema_name, decode(sign(48 - command), 1,
to_char(command), Action Code # to_char(command) ) action, status
session_status, s.osuser os_user_name, s.sid, p.spid , s.serial# serial_num,
nvl(s.username, [Oracle process]) user_name, s.terminal terminal,
s.program program, st.value criteria_value from v$sesstat st, v$session s , v$processp
where st.sid=s.sid and st.statistic#=to_number(38) and (ALL=ALL
or s.status=ALL) and p.addr=s.paddr order by st.value desc, p.spid asc, s.username asc, s.osuser asc
16、查看鎖(lock)情況
select /*+ RULE */ ls.osuser os_user_name, ls.username user_name,
decode(ls.type, RW, Row wait enqueue lock, TM, DML enqueue lock, TX,
Transaction enqueue lock, UL, User supplied lock) lock_type,
o.object_name object, decode(ls.lmode, 1, null, 2, Row Share, 3,
Row Exclusive, 4, Share, 5, Share Row Exclusive, 6, Exclusive, null)
lock_mode, o.owner, ls.sid, ls.serial# serial_num, ls.id1, ls.id2
from sys.dba_objects o, ( select s.osuser, s.username, l.type,
l.lmode, s.sid, s.serial#, l.id1, l.id2 from v$session s,
v$lock l where s.sid=l.sid ) ls where o.object_id=ls.id1 and o.owner
<> SYS order by o.owner, o.object_name
17、查看等待(wait)情況
SELECT v$waitstat.class, v$waitstat.count count, SUM(v$sysstat.value) sum_value
FROM v$waitstat, v$sysstat WHERE v$sysstat.name IN (db block gets,
consistent gets) group by v$waitstat.class, v$waitstat.count
18、查看sga情況
SELECT NAME, BYTES FROM SYS.V_$SGASTAT ORDER BY NAME ASC
19、查看catched object
SELECT owner, name, db_link, namespace,
type, sharable_mem, loads, executions,
locks, pins, kept FROM v$db_object_cache
20、查看V$SQLAREA
SELECT SQL_TEXT, SHARABLE_MEM, PERSISTENT_MEM, RUNTIME_MEM, SORTS,
VERSION_COUNT, LOADED_VERSIONS, OPEN_VERSIONS, USERS_OPENING, EXECUTIONS,
USERS_EXECUTING, LOADS, FIRST_LOAD_TIME, INVALIDATIONS, PARSE_CALLS, DISK_READS,
BUFFER_GETS, ROWS_PROCESSED FROM V$SQLAREA
21、查看object分類數量
select decode (o.type#,1,'INDEX' , 2,'TABLE' , 3 , 'CLUSTER' , 4, 'VIEW' , 5 ,
'SYNONYM' , 6 , 'SEQUENCE' , 'OTHER' ) object_type , count(*) quantity from
sys.obj$ o where o.type# > 1 group by decode (o.type#,1,'INDEX' , 2,'TABLE' , 3
, 'CLUSTER' , 4, 'VIEW' , 5 , 'SYNONYM' , 6 , 'SEQUENCE' , 'OTHER' ) union select
'COLUMN' , count(*) from sys.col$ union select 'DB LINK' , count(*) from
22、按用戶查看object種類
select u.name schema, sum(decode(o.type#, 1, 1, NULL)) indexes,
sum(decode(o.type#, 2, 1, NULL)) tables, sum(decode(o.type#, 3, 1, NULL))
clusters, sum(decode(o.type#, 4, 1, NULL)) views, sum(decode(o.type#, 5, 1,
NULL)) synonyms, sum(decode(o.type#, 6, 1, NULL)) sequences,
sum(decode(o.type#, 1, NULL, 2, NULL, 3, NULL, 4, NULL, 5, NULL, 6, NULL, 1))
others from sys.obj$ o, sys.user$ u where o.type# >=1 and u.user#=
o.owner# and u.name <> 'PUBLIC' group by u.name order by
sys.link$ union select 'CONSTRAINT' , count(*) from sys.con$