澶ф暟鎹枃鎽樻巿鏉冭漿杞借嚜瀹夎開鐨勫啓浣滈棿
浣滆€咃細Andy
浠婃棭涓€璧峰簥灝辯湅鍒癋ran莽ois Chollet澶х錛圞eras浣滆€咃級鍙戞帹錛屾牴鎹?GPT-2涓噺妯″瀷鐨勮秴闀胯窛紱昏蹇嗘兂鍒頒簡涓€縐嶇畝鍗曠殑涓嶅熀浜庢満鍣ㄥ涔犵殑鏂囨湰鐢熸垚鏂瑰紡錛屽眳鐒剁濂囧湴澶嶇幇浜咷PT-2鐨勭粨鏋滐紝鏂規(guī)硶寰堢畝鍗曪紙鍙敤浜?0鍒嗛挓鍐欎唬鐮侊級錛屾瘡嬈$敤瑕佸熀浜庢枃鏈腑鐨勫叧閿瘝錛岃繕鏈夊彞鏈嚑涓瘝錛屽湪璋鋒瓕鐩存帴鎼滅儲錛岀劧鍚庡皢鑾峰彇媯€绱㈢墖孌靛熀浜庢渶鍚庡嚑涓瘝榪炴帴璧鋒潵錛屽彧瑕佽繖鏍蜂笉鍋滃仛鐢氳嚦鑳界敓鎴怗PT-2璁烘枃涓偅涓彂鐜扮濂囩嫭瑙掑吔鐨勪緥瀛愩€?/p>
鑷充簬浠g爜錛孎ran莽ois寰堝菇榛樺湴璇達細鈥淚 will not be releasing the code, because you guys couldn't handle the power of a Python script cobbled together in 20 minutes with Requests, BeautifulSoup, and regular expressions. It would change algorithmic cyberwar forever.鈥濆張鏄竴涓?鈥淭oo dangerous to release銆傗€濅笉榪囨洿澶氳皟渚冩剰涔夊湪閲屽ご錛岀嫚鐙犲槻璁戒簡OpenAI涓€娉€傝秮姝ゆ満浼氾紝涔熸妸鑷繁鎱㈡參鐮佷簡鍑犲ぉ鐨勭鍙戝嚭鏉ャ€?/p>
濡傛灉璇碆ERT妯″瀷榪樺緢宸у鍦版彁鍑篗aske Language Model Loss鍔犱笂 Next Sentence Prediction Loss鏉ヨ棰勮緇冩ā鍨嬪鍒版洿鍏ㄩ潰淇℃伅錛岄偅GPT緋誨垪鍒欏氨鍙槸鎶婅矊浼煎鉤娣℃棤濂囩殑Transformer Decoder錛堝崟鍚戣В鐮侊級緇欏姞澶у啀鍔犲ぇ錛屽綋鐒跺ソ鐨勬暟鎹篃涓嶅彲灝戯紝鐒跺悗鍚戝ぇ瀹跺睍紺?span>澶у埌涓€瀹?/span>紼嬪害鍚?GPT-2)闈炲父鍘夊鐨勶紝鐗瑰埆鍦ㄨ璦€鐢熸垚涓婏紝鍒氬ソ濉ˉ浜咮ERT 鐨勭己闄楓€?/p>
GPT-2鐜板湪濡傛鏈夊悕浼拌涔熸槸鍚稿彇浜嗗墠涓€嬈℃暀璁紝鐩村埌BERT鍑虹幇澶ч儴鍒嗕漢鎵嶇煡閬撴湁涓狦PT錛屼簬鏄疓PT-1瀹岀編鐨勬垚浜咮ERT鐨勫灚鑴氱煶銆傛墍浠ュ緟 GPT-2鍑哄満錛岃櫧鐒惰鏂囨鏂囩煭鐭嚑欏碉紝鍗存槸鍑哄敖椋庡ご錛屼笉鐭ュ叾涓?OpenAI鍏叧鏈夊嚭鍑犲垎鍔涳紝寰呬漢浠棶寮€涓嶅紑婧愬晩錛岀瓟鏇幫細鈥?strong>Too Dangerous to Release錛?/strong>錛堝氨鏄笉緇欎綘浠敤錛侊級鈥?/p>
姝よ涓€鍑猴紝涓€涓嬫儕璧蜂竴鐗囧弽鍝嶏紝绔嬪埢鍑虹幇浜嗘尯OpenAI媧懼拰鍙峅penAI 媧撅紝鍙屾柟璁鴻瘉鍗佽凍錛岀悍綰峰彂鏂囷紝鍏夐偅鍑犲ぉ鎴戞瘡澶╅兘璧風爜寰楃湅涓婁竴綃囧叧浜?GPT-2浜夎鐨勫崥鏂囥€傝€孏PT-2鏈€灝忕殑117MB(鎸囧弬鏁伴噺)棰勮緇冩ā鍨嬶紝涔熷湪榪欏惖鍚甸椆闂逛腑琚?zhèn)勬?zhèn)勬斁浜嗗嚭鏉ャ€?/p>
涔嬪悗錛屽伓鏈夊湪Reddit鐪嬪埌鍑犵瘒鍩轟簬117M妯″瀷finetune鐨勫笘錛屽緢鏈夎叮錛屼竴鐩存兂鎵炬椂闂翠篃寮勪釜鐜╃帺錛屽彲鎯滃お蹇欙紝鍓嶆鏃墮棿涓撴敞BERT鍔犱笂搴﹀亣錛屼篃灝辨悂涓嬩簡銆?/p>
鍥犳錛岀洿鍒板墠鍑犲ぉ錛岃帿鍚嶅彂鐜板叧浜嶨PT-2 finetune鐨勫笘紿佺劧鍙堝彉澶氫簡錛屾墠鍙戠幇OpenAI鍙堟斁鍑轟簡鏇村ぇ鐨勬ā鍨嬶紝涔熷氨鏄繖綃?strong>涓昏浼氱敤鍒扮殑 345M妯″瀷錛堝闇€鐢ㄥ皬妯″瀷錛屽彧闇€灝嗘枃涓?45M鏀逛負117M鍗沖彲錛?/strong>銆傞櫎姝や簩鑰咃紝鏍規(guī)嵁璁烘枃錛屽簲璇ヨ繕鏈変袱涓洿澶фā鍨嬶紝濡傛灉OpenAI鍑嗗鏀懼嚭鐨勮瘽錛屼及璁PT-2榪欎釜姒傚康鑳界倰鏁翠釜2019騫淬€?/p>
瓚佺潃鐜板湪榪欐嘗鐑疆錛屾€葷畻鏄妸GPT-2浣跨敤鐩稿叧鐨勫簱閮芥祻瑙堜簡涓€閬嶏紝欏轟究鑷繁涔焒inetune浜嗗嚑涓ā鍨嬶紝鍙戠幇鏁堟灉榪樻尯濂界殑銆傛澶栧彂鐜扮綉涓婁篃娌″お澶氬叧浜嶨PT-2浣跨敤鐨勪腑鏂囪祫鏂欙紝鍥犳灝卞垎浜竴涓嬭嚜宸辯粡楠屻€?/p>
鏈枃緇撴瀯濡備笅錛屽ぇ瀹惰嚜鍙栨墍闇€錛?/p>
鎵€闇€搴揋ithub閾炬帴錛?/p>
鐢ㄥ埌鐨勮緇冩暟鎹槸鎴戜粠緗戜笂鐖笅鏉ョ殑鑰佸弸璁板崄瀛g殑鍓ф湰銆?/p>
鎺ヤ笅鏉ュ氨璁╂垜浠紑濮嬪惂錛岄粯璁ゅぇ瀹朵細鐢↙inux緋葷粺鏉ユ搷浣溿€?/p>
鑰佹澘鍏堟潵涓€鐩楪PT-2
鏁翠釜榪囩▼澶т綋鍒嗗洓姝ワ紝棣栧厛鎴戜滑闇€瑕佸厛Clone涓嬫潵nshepperd鐨刧pt-2 搴擄紝涔嬪悗鍑嗗鏁版嵁涓庢ā鍨嬶紝鐒跺悗鍐峟inetune錛屾渶鍚庣敤淇濆瓨妯″瀷鏉ョ敓鎴愭牱鏈€?/p>
git clone https://github.com/nshepperd/gpt-2 pip install -r requirements.txt #瀹夎闇€瑕佺敤鍒扮殑鍖?
榪涘叆鏂囦歡澶癸紝涓嬭澆闇€瑕佺殑棰勮緇冩ā鍨嬶紝榪欓噷鐢ㄥ垰鏀懼嚭鏉ョ殑涓瀷妯″瀷錛屾満鍣ㄤ笉澶熷彲浠ョ敤117M妯″瀷銆?/p>
python download_model.py 345M
345M妯″瀷姣旇緝澶э紝澶ф1.4涓狦錛屾墍浠ヤ笅杞藉悓鏃跺彲浠ユ潵澶勭悊鏁版嵁銆傚鏋滅敤鎴戞彁渚涚殑鏁版嵁錛岄偅鐩存帴鎷瘋繃鍘誨氨濂戒簡錛屾斁鍦╠ata/涓嬨€傜◢寰湅鐪嬫暟鎹殑鏍峰瓙鍚с€?/p>
鐒跺悗灝卞彲浠ュ紑濮媐inetune浜嗐€傚鎯寵finetune鏃舵洿蹇簺鐨勮瘽錛屽彲浠ラ緙栫爜鏁版嵁鎴愯緇冩牸寮忋€?/p>
PYTHONPATH=src./encode.pydata/friends.txt data/friends.txt.npz
寮€濮媐inetune鍚э紒
PYTHONPATH=src ./train.py --dataset data/friends.txt.npz --model_name 345M
鍏朵粬鍊煎緱鍏蟲敞鍙傛暟錛?/p>
鏍規(guī)嵁鏈哄櫒璁粌閫熷害浼氫笉鍚岋紝浣嗗熀鏈笂涓や笁鍗冩灝辮兘鐪嬪埌浜涜繕綆椾笉閿欑殑緇撴灉浜嗐€?/p>
浜庢槸鎴戜滑灝辨嬁鍒頒簡finetune濂界殑妯″瀷錛屾帴涓嬫潵灝辨潵榪涜濂界帺鐨勭敓鎴愮幆鑺傚惂銆傜涓€姝ラ渶瑕佸皢鐢熸垚鐨勬ā鍨嬶紝鏇存敼鍚嶅瓧錛屾斁鍏odels鏂囦歡澶歸噷錛屾浛鎹㈡帀鍘熸潵鐨勬ā鍨嬶紙涓€瀹氳璁板緱灝嗕箣鍓嶇殑妯″瀷澶囦喚錛侊級銆?/p>
姣斿璇村皢checkpoint/run1閲岀殑model-4000妯″瀷鍚嶅瓧閮芥敼鎴恗odel.ckpt錛岀劧鍚庣Щ鍏odels/345M閲屽幓銆?/p>
OK浜?鍏堟槸鑷敱鍙戞尌鐜妭錛岀敤generate_unconditional_samples.py鏉ユ棤鏉′歡鐢熸垚鏍鋒湰銆?/p>
python src/generate_unconditional_samples.py --top_k 40 --temperature 0.9 --model_name 345M
鐒跺悗鏄懡棰樹綔鏂囷紝鏈夋潯浠朵簰鍔ㄧ敓鎴愮幆鑺傘€?/p>
python src/interactive_conditional_samples.py --top_k 40 --temperature 0.9 --model_name 345M
榪愯鍚庝細鍑虹幇涓€涓簰鍔ㄦ錛岃緭鍏ヤ綘鎯寵妯″瀷緇啓鐨勮瘽錛岃鎴戞兂鎯?..
涓嬮潰灝辨槸瑙佽瘉濂囪抗鐨勬椂鍒諱簡... ... ... 濂戒竴浼氬効鍚庯紝褰撳綋
鍦≧achel loves Andy涓ょ鍚庯紝瀹岀編璺戦錛屼激蹇冿紝涓嶈繃鎰熻鍚庡崐孌佃繕鏄緢鏈夋剰鎬濄€?/p>
鍏充簬鍙傛暟--topk榪樻湁--temperature錛屼細褰卞搷鐢熸垚鐨勬晥鏋滐紝鍙嚜宸卞皾璇曡皟鑺備竴涓嬶紝涓婇潰渚嬪瓙浣跨敤鐨勬槸涓や釜鎺ㄨ崘璁懼畾銆?/p>
鍒版finetune涓€涓熀鏈珿PT-2鐨勮繃紼嬪氨瀹屼簡錛屾槸涓嶆槸姣旀兂璞′腑瑕佺畝鍗曞緢澶氥€?/p>
涓嶈繃涓嬮潰榪樻湁鏇寸畝鍗曠殑鏂規(guī)硶銆?/p>
綆€涔嬪張綆€錛歡pt-2-simple
濡傚叾鍚嶏紝gpt-2-simple搴撳氨鏄彲浠ヨ浣犳洿綆€鍗昮inetune鍜岀敓鎴愶紝涓昏鍩轟簬涓婇潰鐨刧pt-2鍐欑殑銆?/p>
鍏抽敭浣跨敤鏁欑▼錛屾垜鐩存帴灝咰olab Notebbok閮ㄥ垎鍐呭鏀懼湪榪欙紝鏇磋緇嗘煡鐪婲otebook銆傛帹鑽愪嬌鐢∟otebook鏌ョ湅鏁欑▼錛屾湁鍏嶈垂GPU鍙互钖呫€?/p>
Notebook閾炬帴錛歨ttps://colab.research.google.com/drive/1_kQQ8WCjus9mz0Cf1onVeE1pUG-ulTqA
鏁翠釜榪囩▼澶т綋鍜屼笂闈竴鏍鳳紝涓嶈繃鍛戒護鏇村姞綆€鍗曚簡銆傚悓鏍峰厛鏄笅杞芥ā鍨嬨€?/p>
import gpt_2_simple as gpt2 gpt2.download_gpt2(model_name="345M")
鐒跺悗鏀句笂璁粌鏁版嵁錛屽氨鍙互寮€濮嬭緇冧簡銆?/p>
sess=gpt2.start_tf_sess() gpt2.finetune(sess, dataset="friends.txt", model_name='345M', steps=1000, restore_from='fresh', print_every=10, sample_every=200, save_every=500 )
寰堢洿瑙傦紝鐩存帴璋冪敤gpt2.finetune灝卞彲浠ヤ簡銆?/p>
gpt2.finetune璁粌鍙傛暟浠嬬粛錛?/p>
浣犱細鍙戠幇鍜屼笂涓€鑺傚緢澶氬唴瀹歸兘綾諱技銆?/p>
璁粌鑾峰緱淇濆瓨妯″瀷鍚庯紝鍙堝埌浜嗙敓鎴愮幆鑺傦紝鍏堟妸妯″瀷load榪涙潵銆?/p>
sess=gpt2.start_tf_sess() gpt2.load_gpt2(sess)
鐒跺悗鐢熸垚鏂囨湰銆?/p>
gpt2.generate(sess)
gpt2.generate閲岄潰涔熸湁寰堝鍙傛暟鍙互璁劇疆錛?/p>
瑕佸ぇ閲忕敓鎴愭枃鏈殑璇濆彲浠ョ敤gpt2.generate_to_file.
閮ㄧ講鍒版湇鍔″櫒涓?/strong>
鏃㈢劧寮勫ソ浜嗘ā鍨嬶紝閭d箞褰撶劧灝辨槸瑕佸紑濮嬬偒鑰€鍟︼紝閮ㄧ講鍒版湇鍔″櫒涓婏紝璁╁皬浼欎即浠粠嫻忚鍣ㄤ篃鑳界洿鎺ヤ簰鍔ㄧ敓鎴愭枃鏈€?/p>
涓昏鐢ㄥ埌Github涓婄殑gpt-2-flask-api搴擄紝鍙渶瑕佹彁渚涘畠涓€涓璁粌鎴栬€協(xié)inetune濂界殑GPT2妯″瀷錛圚uggingface鐨刾ytorch鏍煎紡錛夈€?/p>
灝嗘ā鍨嬫枃浠舵斁鍦╩odels/涓嬶紝鍛藉悕涓篻pt2-pytorch_model.bin涔熷彲浠ュ厛鐢ㄥ畠鎻愪緵鐨勫疄渚嬫ā鍨嬫潵鍋氫釜瀹為獙錛?/p>
mkdir models curl --output models/gpt2-pytorch_model.bin https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-pytorch_model.bin
涔嬪悗榪愯python deployment/run_server.py.
鐒跺悗錛屼細鑾峰緱涓€涓闂鍙o細
涔嬪悗鐩存帴鐢ㄦ祻瑙堝櫒璁塊棶灝辮浜嗭紝濡傛灉鏄繙紼嬭闂妸0.0.0.0鏀規(guī)垚鏈嶅姟鍣↖P灝卞ソ浜嗐€?/p>
鐜板湪寰€閲岄潰閿叆鎯寵瀹冪畫鍐欑殑璇濆氨琛岋紝絳変竴浼氬効錛岀粨鏋滃氨鍑烘潵浜嗐€傞粦鑹茬殑鏄敤鎴瘋緭鍏ワ紝綰㈣壊鐨勬槸妯″瀷鐢熸垚銆?/p>
鏈€鍚庣殑闂錛氬浣曢儴緗茶嚜宸辯殑妯″瀷
鍥犱負finetune淇濆瓨鐨則ensorflow鐨勬ā鍨嬫枃浠舵牸寮忥紝浣嗚繖涓寘鍙敮鎸?Pytorch鐨勪繚瀛樻ā鍨嬨€傚洜姝ゆ垜浠鍏堝皢tensorflow鐨勬ā鍨嬭漿鎹㈡垚 Pytorch鐨勬ā鍨嬨€?/p>
榪欓噷鍙互鐢℉uggingface鐨刾ytorch-pretrained-BERT搴撻噷闈㈢殑杞崲鑴氭湰錛屽厛鏍規(guī)嵁鎸囩ず瀹夎搴擄紝涔嬪悗榪愯浠ヤ笅鑴氭湰銆?/p>
export GPT2_DIR=妯″瀷鎵€鍦ㄦ枃浠跺す
pytorch_pretrained_bert convert_gpt2_checkpoint $GPT2_DIR/model_name output_dir/ path_to_config/config.json
涓婇潰鍛戒護convert_gpt2_checkpoint鍚庝笁涓弬鏁板垎鍒槸錛岃緭鍏ョ殑 tensorflow妯″瀷璺緞錛岃漿鎹㈣緭鍑虹殑pytorch妯″瀷璺緞錛屾ā鍨嬬殑閰嶇疆鍙傛暟鏂囦歡銆?/p>
闇€瑕佹敞鎰忕殑鏄紝鍥犱負榪欏嚑涓簱涔嬮棿鐨勪笉緇熶竴錛屾墍浠ヤ笅杞戒笅鏉?45M妯″瀷鐨勮緗枃浠跺湪杞崲鏃朵細鍑洪敊錛岄渶瑕佹坊鍔犱竴浜涘弬鏁般€傚墠闈㈡湁涓嬭澆345M妯″瀷鐨勮瘽錛屼細鍙戠幇妯″瀷鏂囦歡澶逛笅鏈変竴涓緗枃浠秇params.json銆?/p>
cp hparams.json hparams_convert.json#澶嶅埗涓€浠芥潵淇敼涔嬪悗鍦╤params_convert.json閲屾坊鍔犲嚑涓弬鏁幫紝鏀規(guī)垚涓嬮潰榪欐牱錛?/p>
{ "n_vocab": 50257, "n_ctx": 1024, "n_embd": 1024, "n_head": 16, "n_layer": 24, "vocab_size":50257, "n_positions":1024, "layer_norm_epsilon":1e-5, "initializer_range": 0.02 }
灝嗚繖涓緗枃浠舵寚瀹氬埌杞崲鍛戒護convert_gpt2_checkpoint鍚庨潰鐩稿簲鍙傛暟鍘匯€?/p>
鑾峰緱杞崲妯″瀷鍚庯紝鎶婂畠鏀懼叆models/涓幓錛屽茍涓旈噸鍛藉悕錛屼箣鍚庢妸deployment/GPT2/config.py閲岄潰鐨勫弬鏁拌瀹氭敼鎴?45M澶фā鍨嬬殑鍙傛暟灝卞ソ浜嗐€?/p>
class GPT2Config(object): def __init__( self, vocab_size_or_config_json_file=50257, n_positions=1024, n_ctx=1024, n_embd=1024, n_layer=24, n_head=16, layer_norm_epsilon=1e-5, initializer_range=0.02, ):
鏈€鍚庤繍琛宺un_server.py錛屾垚鍔熻澆鍏ユā鍨嬶紝閮ㄧ講瀹屾垚錛佷箣鍚庢祴璇曚竴涓嬶紝鍙戠幇紜疄鏄凡緇廸inetune濂界殑鑰佸弸璁版ā鍨嬨€?/p>
闅忕潃鏃墮棿鐨勬帹縐伙紝澶фā鍨嬬殑璁粌鎴愭湰鍐嶉檷錛屽浠婂彧闇€鍑犵櫨緹庡厓錛屽氨鍙互澶嶇幇 GPT-2銆?/span>
OpenAI 鍦?2019 騫存帹鍑轟簡 GPT-2 鏃訛紝鎹濯?Tom鈥榮 Hardware 鎶ラ亾縐幫紝褰撴椂璁粌璐圭敤涓烘瘡灝忔椂 256 緹庡厓銆?/span>濡備粖浜斿勾榪囧幓浜嗭紝闅忕潃 GPT-4 浠ュ強鏃楄埌綰?GPT-4o 鐨勫埌鏉ワ紝AI 澶фā鍨嬬殑璁粌鎴愭湰鏄惁闄嶄簡錛?/span>
瀵規(guī)錛岀壒鏂媺鍓?AI 鎬葷洃銆丱penAI 鑱斿悎鍒涘浜?Andrej Karpathy 浜庤繎鏃?span style='font-family: -apple-system-font, system-ui, "Helvetica Neue", "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;letter-spacing: 1px;text-align: start;text-wrap: wrap;background-color: rgb(255, 255, 255);'>閲嶇幇 GPT-2 欏圭洰涔嬪悗緇欏嚭浜嗗叿浣撶殑絳旀錛屽叾琛ㄧず錛屻€屼粖澶╋紝浣犲彲浠ヨ姳璐?/span>綰?672 緹庡厓璁粌鑷繁鐨勬ā鍨嬶紝鍦ㄤ竴涓?8XH100 GPU 鑺傜偣涓婅繍琛?24 灝忔椂銆傘€?/span>浜嬪疄璇佹槑錛岀‖浠躲€佽蔣浠跺拰鏁版嵁鏂歸潰鐨勮繘姝ユ剰鍛崇潃璁粌鍚屼竴涓ā鍨嬫墍闇€鐨勬椂闂村拰閲戦挶閮戒細鍑忓皯銆?/span>
涓庢鍚屾椂錛孉ndrej Karpathy 榪樺湪鑷繁鐨?GitHub 欏圭洰欏甸潰錛坔ttps://github.com/karpathy/llm.c/discussions/677錛変腑鍒嗕韓浜嗘暣涓噸鐜扮殑榪囩▼錛屾垜浠笉濡ㄦ潵鐪嬬湅榪欎綅澶х鏄€庝箞鍋氱殑銆?/span>
鍙敤 672 緹庡厓鐨勪環(huán)鏍煎湪 24 灝忔椂鍐呴噸鐜?GPT-2 妯″瀷
鍊煎緱涓€鎻愮殑鏄紝Andrej Karpathy 浜庝粖騫?2 鏈堝甯冧粠 OpenAI 鍙嬪ソ鍦扮鑱屽悗錛屾病澶氫箙錛屼粬灝?span style='color: rgb(51, 51, 51);font-family: -apple-system-font, system-ui, "Helvetica Neue", "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;font-size: 15px;letter-spacing: 1px;text-align: start;text-wrap: wrap;outline: 0px;visibility: visible;'>甯︽潵浜嗚嚜宸卞緬鎵嬬紪鍐欑殑 1000 琛?C 浠g爜鍗沖疄鐜?GPT-2 璁粌鐨勬柊欏圭洰鈥斺€?span style="outline: 0px;background-color: rgb(255, 255, 255);visibility: visible;">LLM.c錛坔ttps://github.com/karpathy/llm.c錛夈€?/span>鍦ㄨ繖涓」鐩熀紜€涓婏紝Andrej Karpathy 鏃朵笅鐩存帴閲嶇幇浜嗗畬鏁寸殑 15.58 浜夸釜鍙傛暟鐨?GPT-2 欏圭洰錛屽氨鏄郊鏃?OpenAI 鍦ㄣ€婃洿濂界殑璇█妯″瀷鍙婂叾褰卞搷銆嬶紙https://openai.com/index/better-language-models/錛変腑浠嬬粛鐨勯偅涓?GPT-2銆?/span>
Andrej Karpathy 琛ㄧず錛宭lm.c 鐩存帴鍦?C/CUDA 涓畬鎴愶紙鍏辯害 5000 琛屼唬鐮侊級錛岃€屼笉闇€瑕佷紶緇熺殑璁粌鏍堬紝璇ュ爢鏍堟秹鍙婂埌浜?Python 瑙i噴鍣ㄥ拰 PyTorch/JAX銆乭uggingface/transformers 絳夋槑鏄炬洿澶嶆潅鐨勬繁搴﹀涔犲簱銆?/span>
2019 騫達紝璁粌 GPT-2 鏄竴涓渶瑕佹暣涓洟闃熷弬涓庣殑欏圭洰錛岃璁や負鏄竴嬈″ぇ鍨嬫ā鍨嬭繍琛屽疄璺碉紝浣?5 騫村悗鐨勪粖澶╋紝鐢變簬璁$畻錛圚100 GPU錛夈€佽蔣浠訛紙CUDA銆乧uBLAS銆乧uDNN銆丗lashAttention錛夊拰鏁版嵁錛堝 FineWeb-Edu 鏁版嵁闆嗭級鐨勬敼榪涳紝浠栦滑鍋氬埌浜嗗彲浠ュ湪鍗曚釜 8XH100 鑺傜偣涓婁互 672 緹庡厓鐨勪環(huán)鏍煎湪 24 灝忔椂鍐呴噸鐜拌繖涓ā鍨嬨€?/span>
鈥滆繖鏄潪甯鎬笉鍙€濊鐨勨€濓紝Andrej Karpathy 璇撮亾銆備笉榪囷紝榪欏叾涓篃鏈変竴浜涙敞鎰忎簨欏瑰拰鎸戞垬鈥斺€攍lm.c 浠嶆湭寰楀埌瀹岀編璋冩暣鍜屽厖鍒嗙ǔ瀹氾紙鎴戜滑浠嶆椂涓嶆椂浼氱湅鍒?loss 宄板€煎拰涓嶈壇嬋€媧昏寖鍥達級錛岃€屼笖璇勪及涔熶笉澶熷叏闈紙渚嬪錛屾病鏈変粩緇嗚瘎浼板璇█銆佷唬鐮佸拰鏁板錛夈€?/span>
澶嶇幇鍑嗗宸ヤ綔
Andrej Karpathy 鍒嗕韓閬擄紝浣跨敤 llm.c 璁粌 GPT-2 闈炲父綆€鍗曪紝鍥犱負瀹冩槸鐢?C/CUDA 緙栧啓鐨勶紝鎵€浠ヤ笉闇€瑕?minconda銆丳ython銆丳yTorch 絳夎蔣浠躲€?/span>
浣犲彧闇€瑕佷竴涓?8XH100 GPU銆?/span>
涓嶈繃錛宭lm.c 鐨勮綆楁柟寮忓緢鐏墊椿鈥斺€斿鏋滀綘鍙湁 1 涓?GPU錛屼綘浠嶇劧鍙互鑾峰緱 GPT-2錛屽彧鏄渶瑕佺瓑寰?8 澶╋紝鑰屼笉鏄?1 澶┿€傚鏋滀綘鏈?16 涓?GPU錛堜緥濡備嬌鐢ㄦ柊鐨?Lambda 1 Click Clusters錛夛紝浣犲氨鍙互榪涜澶氳妭鐐硅緇冿紝鍙渶絳夊緟 12 涓皬鏃躲€?/span>
鍚姩鑺傜偣鍚庯紝浠ヤ笅鏄緇?GPT-2 鐨勫畬鏁磋鏄庯紙浠庣┖鐧芥鍒板紑濮嬫墽琛屽彧闇€綰?1 鍒嗛挓鐨勬椂闂達級錛?/span>
# install cudnn so we can use FlashAttention and run fast (optional)
# https://developer.nvidia.com/cudnn-downloads
# for me, CUDA 12 (run `nvcc --version`) running on Linux x86_64 Ubuntu 22.04
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update
sudo apt-get -y install libcudnn9-dev-cuda-12
# "install" cudnn-frontend to ~/
git clone https://github.com/NVIDIA/cudnn-frontend.git
# install MPI (optional, if you intend to use multiple GPUs)
# (you might also have to install NVIDIA NCCL if it doesn't come with your setup)
sudo apt -y install openmpi-bin openmpi-doc libopenmpi-dev
# download and enter llm.c repo
git clone https://github.com/karpathy/llm.c.git
cd llm.c
# download the "starter pack" (~1GB download)
# contains GPT2-124M weights (used in tests), tokenizer, eval data .bin s
./dev/download_starter_pack.sh
# download the training dataset (FineWeb-Edu 100B token) .bin data shards
# note: this is a total of 1001 data shards. If you only want to test things
# out and don't want to do an actual run, feel free to append the number of
# training shards to download (e.g. for just 10 shards: ./edu_fineweb.sh 10)
# the full dataset is ~200GB, we can store it here in dev/data directory.
cd dev/data
./edu_fineweb.sh
# compile (~1 min 1st time for cuDNN mostly, few sec from then on)
cd ../../
make train_gpt2cu USE_CUDNN=1
# and train! (wait 24 hours here)
mpirun -np 8 ./train_gpt2cu \
-i "dev/data/edu_fineweb100B/edu_fineweb_train_*.bin" \
-j "dev/data/edu_fineweb100B/edu_fineweb_val_*.bin" \
-o "log_gpt2_1558M" \
-v 250 -s 300000 -g 384 \
-h 1 \
-b 16 -t 1024 \
-d 1048576 \
-r 0 \
-z 1 \
-c 0.1 \
-k "cosine" \
-l 0.0006 \
-q 0.1 \
-u 700 \
-n 2000 \
-x 32000 \
-ge 1 \
-y 1 \
-e "d48"
鎺ヤ笅鏉ワ紝浣犲皢鐪嬪埌涓€鍫嗘墦鍗版粴鍔紝鐒跺悗浼樺寲灝嗗紑濮嬶細
num_parameters: 1557686400=> bytes: 3115372800
allocated 2971 MiB for model parameters
batch_size B=16 * seq_len T=1024 * num_processes=8 and total_batch_size=1048576
=> setting grad_accum_steps=8
created directory: log_gpt2_1558M
allocating 40409 MiB for activations
val loss 11.129390
allocating 2971 MiB for parameter gradients
allocating 742 MiB for AdamW optimizer state m
allocating 742 MiB for AdamW optimizer state v
allocating 742 MiB for master copy of params
step 1/32000 | loss 11.133732 (+nanz)| norm 52.9732 (+nanz)| lr 8.57e-07 | 3056.36 ms | 42.6% bf16 MFU | 343080 tok/s
step 2/32000 | loss 10.539388 (+nanz)| norm 43.5996 (+nanz)| lr 1.71e-06 | 2747.19 ms | 47.4% bf16 MFU | 381690 tok/s
step 3/32000 | loss 9.894109 (+nanz)| norm 23.2229 (+nanz)| lr 2.57e-06 | 2753.25 ms | 47.3% bf16 MFU | 381259 tok/s
step 4/32000 | loss 9.566241 (+nanz)| norm 28.4920 (+nanz)| lr 3.43e-06 | 2741.47 ms | 47.5% bf16 MFU | 381690 tok/s
step 5/32000 | loss 9.482848 (+nanz)| norm 23.7817 (+nanz)| lr 4.29e-06 | 2752.07 ms | 47.3% bf16 MFU | 381507 tok/s
step 6/32000 | loss 9.332832 (+nanz)| norm 15.9113 (+nanz)| lr 5.14e-06 | 2751.01 ms | 47.3% bf16 MFU | 381431 tok/s
step 7/32000 | loss 9.165650 (+nanz)| norm 10.5941 (+nanz)| lr 6.00e-06 | 2753.03 ms | 47.3% bf16 MFU | 381327 tok/s
step 8/32000 | loss 9.132234 (+nanz)| norm 16.2733 (+nanz)| lr 6.86e-06 | 2748.91 ms | 47.3% bf16 MFU | 381348 tok/s
step 9/32000 | loss 9.097384 (+nanz)| norm 12.1342 (+nanz)| lr 7.71e-06 | 2748.73 ms | 47.3% bf16 MFU | 381367 tok/s
step 10/32000 | loss 9.072879 (+nanz)| norm 10.5923 (+nanz)| lr 8.57e-06 | 2749.40 ms | 47.3% bf16 MFU | 381369 tok/s
...
鍙互鐪嬪埌錛屾瘡涓€姝ラ鐨勬椂闂寸害涓?2.75 縐掞紝涓€鍏辨湁 32000 涓楠わ紝鎵€浠ョ幇鍦ㄦ垜浠絳夊緟綰?24 灝忔椂銆?/span>
鍦ㄦ瘡涓€姝ラ涓紝璁粌榪愯閮戒細浠?FineWeb-EDU 錛堣繖浜涢兘鏄簰鑱旂綉涓婄殑鏁欒偛緗戦〉錛変腑鎶藉彇綰?100 涓囦釜 token錛屽茍鏇存柊妯″瀷鐨?1.558 浜夸釜鏉冮噸錛屼互渚挎洿濂藉湴棰勬祴搴忓垪涓殑涓嬩竴涓?token銆?/span>
鏈€鍚庯紝鎬誨叡澶勭悊浜?32,000 * 1048576=336 浜夸釜 token銆傞殢鐫€鑳芥洿濂藉湴棰勬祴涓嬩竴涓?token錛宭oss 涔熶細闅忎箣鍑忓皯銆傚父妯″皢紼沖畾鍦?0.1-1 宸﹀彸錛屽涔犵巼鍦ㄥ墠鍑犳涓緱鍒頒簡棰勭儹銆傛墍浠ワ紝榪欓噷鐨勬ā鍨嬪崟鍏冨埄鐢ㄧ巼錛圡FU錛夌害涓?50%錛屽嵆鐩稿綋楂樻晥銆?/span>
鐜板湪絳夊緟 24 灝忔椂鍚庯紝鍙互浣跨敤 dev/vislog.ipynb jupyter notebook 鏌ョ湅 main.log 鏃ュ織鏂囦歡銆備負姝わ紝浣犺繕闇€瑕佸畨瑁?Python 鍜?matplotlib銆?/span>
楠岃瘉涓庤瘎浼?/span>
鏍規(guī)嵁涓婂浘鎵€紺猴紝宸﹁竟榪借釜鐨勬槸 FineWeb-EDU 楠岃瘉鏁版嵁鐨?loss銆傚鏋滃彧榪愯 OpenAI 鍙戝竷鐨?GPT-2 騫惰瘎浼板叾鍦ㄨ繖涓€鏁版嵁涓婄殑 loss錛屽氨浼氬緱鍒扮孩鑹叉按騫崇嚎錛坙oss 涓?2.83錛夈€?/span>
瀵規(guī)瘮涔嬩笅錛?span style='font-family: -apple-system-font, system-ui, "Helvetica Neue", "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;letter-spacing: 1px;text-align: start;text-wrap: wrap;background-color: rgb(255, 255, 255);'>Andrej Karpathy 妯″瀷鐨勮繍琛岄€熷害寰堝揩灝辮秴榪囦簡瀹冿紝姝ラ暱綰︿負 5,000銆?/span>
涓嶈繃錛?span style='font-family: -apple-system-font, system-ui, "Helvetica Neue", "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;letter-spacing: 1px;text-align: start;text-wrap: wrap;background-color: rgb(255, 255, 255);'>Andrej Karpath 鍧﹁█錛?/span>榪欑姣旇緝騫朵笉鍏鉤錛屽洜涓?GPT-2 鏄湪浠庢湭鍙戝竷鐨?WebText 鏁版嵁闆嗕笂璁粌鐨勶紝鍥犳鍙兘瀛樺湪杈冨ぇ鐨勫垎甯冨亸縐匯€傚洜姝わ紝涓句緥鏉ヨ錛屽鏋滀互 LR 1e-4 瀵?OpenAI 妯″瀷榪涜 1000 姝ョ殑寰皟錛宭oss 浼氳繀閫熶笅闄嶅埌钃濈嚎錛堟崯澶變負 2.61錛夛紝鍥犱負瀹冩鍦ㄥ揩閫熼€傚簲鏂扮殑鏁版嵁緇熻銆?/span>
鈥滄垜鍠滄鎶婇獙璇?loss 鐪嬫垚鏄悊鏅虹殑媯€鏌ワ紝浣嗗疄闄呮瘮杈冩椂錛屾垜浠繕鏄鐪嬪浐瀹氱殑絎笁鏂硅瘎浼扮粨鏋溿€侶ellaSwag 璇勪及鏄竴縐嶈〃鐜拌壇濂姐€佸鉤紼熾€佸父瑙併€佺粡甯歌寮曠敤鐨勮瘎浼幫紝瀹冭繕鑳芥彁渚涙棭鏈熶俊鍙楓€傝繖浜涢兘鏄畝鍗曠殑甯歌瘑鎬у満鏅紝妯″瀷蹇呴』閫夋嫨姝g‘鐨勫歡緇€濓紝Andrej Karpath鍐欓亾銆?/span>
鍦ㄥ彸渚х獥鏍間腑瀵?HellaSwag 榪涜璇勪及錛屽彲浠ョ湅鍒?llm.c 妯″瀷鍦ㄥぇ綰?25K 姝ュ乏鍙寵秴瓚婁簡 GPT-2 妯″瀷錛堟棭浜?GPT-2錛屾嵁浼拌 GPT-2 鏄湪 ~100B 涓瘝緇勪笂璁粌鍑烘潵鐨勩€傝繖鍙兘涓庢暟鎹川閲忕殑鎻愰珮鏈夊叧錛孉ndrej Karpath 縐幫紝鍦ㄤ箣鍓嶇殑 124M 榪愯涓篃瑙傚療鍒頒簡榪欎竴鐐癸級銆?/span>
緇跨嚎鏄浉鍚岃妯$殑 GPT-3 妯″瀷錛屽畠鐨勬ā鍨嬫灦鏋勪笌 GPT-2 鍩烘湰鐩稿悓錛屽彧鏄暐鏈変笉鍚岋紙涓婁笅鏂囬暱搴︿負 1024 -> 2048錛夛紝浣嗚緇冧簡 3 浜夸釜 token錛堝嵆姣旀垜浠湪榪欓噷璁粌鐨?token 澶?10 鍊嶏級銆?/span>
Andrej Karpath 琛ㄧず錛屻€屾垜鎯寵鐨勬槸錛屽嵆浣挎槸 HellaSwag 涔熶笉鏄竴涓悊鎯崇殑鍗曠偣姣旇緝閫夐」錛屽洜涓哄畠嫻嬭瘯鐨勬槸綆€鍗曠殑鑻辮鍜屽父璇嗭紝鑰屼笉鏄璇█銆佹暟瀛︽垨浠g爜銆傚彲鑳芥槸 WebText 鏁版嵁闆嗗湪榪欎簺鏂歸潰鐨勬潈閲嶈緝澶э紝鑰岃繖浜涢鍩熷湪鏌愮紼嬪害涓?紿冨彇"浜嗘ā鍨嬬殑鑳藉姏錛屾垜浠笉寰楄€岀煡錛屽洜涓哄畠浠庢湭鍙戝竷榪囥€傛渶鍚庯紝涓€鑸潵璇達紝鍦?GPT-2 榪欐牱鐨勪綆妯″瀷鑳藉姏涓嬶紝濂界殑璇勪及緇撴灉鏇撮毦錛屽洜涓烘ā鍨嬩笉鐞嗚В澶氶」閫夋嫨錛岃€屼笖瀹冧滑鐨勬牱鏈川閲忎笉澶熼珮錛屾棤娉曞湪鏍囧噯鏁板鎴栦唬鐮?evals 涓彇寰楅珮浜庡伓鐒舵€х殑鏁堟灉銆傘€?/span>
鍙傛暟鎸囧崡
璁╂垜浠潵璇︾粏浜嗚В涓€涓嬬幇鍦ㄤ紶鍏ヨ緇冪殑鍙傛暟銆侽penAI 鍙戝竷鐨?GPT-2 鍖呭惈妯″瀷鏉冮噸錛屼絾緇嗚妭寰堝皯錛涜€?GPT-3 娌℃湁鏉冮噸錛屼絾緇嗚妭寰堝銆傚洜姝わ紝鍦ㄥ緢澶氭儏鍐典笅錛?span style='font-family: -apple-system-font, system-ui, "Helvetica Neue", "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;letter-spacing: 1px;text-align: start;text-wrap: wrap;background-color: rgb(255, 255, 255);'>Andrej Karpath 閲囩敤 GPT-3 璁烘枃涓殑瓚呭弬鏁幫紝鍥犱負 GPT-2 璁烘枃涓殑淇℃伅闈炲父闈炲父灝戯細
mpirun -np 8 ./train_gpt2cu 錛堝惎鍔ㄥ懡浠わ細浣跨敤 mpi 鍚姩 8 涓繘紼嬶紝姣忎釜榪涚▼鍦?1 涓?GPU 涓婅繍琛岃緇冿紝鍦ㄨ繖涓?8XH100 鑺傜偣涓婂叡鏈?8 涓?GPU錛夈€傚鏋滄湁 4 涓?GPU錛岃浣跨敤 -np 4銆傚鏋滃彧鏈?1 涓?GPU錛屽垯鍙互璺寵繃 mpi錛屽嵆鍙渶灝嗗叾鏇存敼涓?./train_gpt2cu銆?/span>
-i -j 鏄緇冨拰楠岃瘉鍒嗗壊鏍囪鏂囦歡錛岄€氳繃 edu_fineweb.sh 涓嬭澆
-o 鏄皢鏃ュ織鍜屾鏌ョ偣鍐欏叆鐨勮緭鍑虹洰褰?/span>
-v 250 瑕佹眰姣?250 姝ヨ瘎浼板茍璁板綍楠岃瘉loss
-s 300000 瑕佹眰姣?300000 姝ラ噰鏍蜂竴浜?token銆傜敱浜庢€繪鏁板皢灝忎簬姝ゅ€鹼紝鍥犳榪欐槸涓€縐嶅叧闂噰鏍風殑綆€渚挎柟娉曪紝鍙細鍦ㄦ渶鍚庨噰鏍蜂竴嬈°€?/span>
-g 384 璁劇疆鏈€鍚庨噰鏍風殑鏍囪鏁頒負 384
-h 1 瑕佹眰璇勪及 HellaSwag 鐨勫噯紜€?/span>
-b 16 灝嗗井鎵瑰ぇ灝忚緗負 16銆傚鏋滃唴瀛樹笉瓚籌紝鍙互鍑忓皬璇ュ€鹼紝渚嬪灝濊瘯 8銆?銆?錛岀洿鑷?1銆?/span>
-t 1024 灝嗘渶澶у簭鍒楅暱搴﹁緗負 1024錛屼笌 GPT-2 鐩稿悓銆?/span>
-d 1048576 鎸夌収 GPT-3 鐨勮秴鍙傛暟琛紝瑕佹眰鎬繪壒嬈″ぇ灝忎負 20 鐨?2 嬈℃柟銆備唬鐮佸皢紜繚婊¤凍鎵€闇€鐨勬€繪壒嬈″ぇ灝忥紝騫惰綆椾紭鍖?"鍐呭驚鐜?"姝ラ鎵€闇€鐨勬搴︾瘡縐€備緥濡傦紝鍦ㄤ笂闈㈡垜浠湅鍒版湁 8 涓?GPU錛屾瘡涓?GPU 澶勭悊 16 X 1024 涓唬甯侊紝閭d箞姣忎釜寰錛堝崟嬈″墠榪涘悗閫€錛夊氨鏄?8 X 16 X 1024=131,072 涓唬甯侊紝鍥犳浠g爜璁$畻鍑烘搴︾瘡縐楠や負 8錛屼互婊¤凍姣忔鎵€闇€鐨?100 涓囨壒嬈″ぇ灝忥紝鍗沖墠榪?鍚庨€€ 8 嬈★紝鐒跺悗鍗曟鏇存柊銆?/span>
-r 0 璁劇疆閲嶆柊璁$畻涓洪浂銆傞噸鏂拌綆楁槸涓€縐嶆潈琛¤綆楀拰鍐呭瓨鐨勬柟娉曘€傚鏋?-r 涓?1錛岄偅涔堟垜浠皢鍦ㄥ悗鍚戣繃紼嬩腑閲嶆柊璁$畻鍓嶅悜榪囩▼鐨勪竴閮ㄥ垎錛圙eLU錛夈€傝繖鎰忓懗鐫€鎴戜滑涓嶅繀緙撳瓨瀹冿紝浠庤€岃妭鐪佷簡鍐呭瓨錛屼絾浠d環(huán)鏄渶瑕佹洿澶氱殑璁$畻閲忋€傚洜姝わ紝濡傛灉鍐呭瓨涓嶈凍錛屽彲浠ヨ瘯璇?-r 1 鎴?-r 2錛堜篃浼氶噸鏂拌綆楀竷灞€錛夈€?/span>
-z 1 鍦ㄥ涓?GPU 涓婂紑鍚?ZeRO-1錛堝嵆浼樺寲鍣ㄧ姸鎬佸垎鐗囷級銆傚鏋滆緇冧嬌鐢ㄧ殑 GPU 瓚呰繃 1 涓紝鍒欐棤闇€鑰冭檻姝よ緗紝鍩烘湰涓婂簲濮嬬粓寮€鍚€傚湪浣跨敤 1 涓?GPU 鐨勬儏鍐典笅錛屾璁劇疆涓烘棤鏁堛€?/span>
-c 0.1 灝嗘潈閲嶈“鍑忚緗負 0.1銆傚彧鏈夛紙2D錛夋潈閲嶇殑琛板噺涓?GPT-2 瀹屽叏鐩稿悓錛岃繖涓暟瀛楁潵鑷?GPT-3 璁烘枃銆?/span>
-k "浣欏雞"璁劇疆浣欏雞瀛︿範鐜囪鍒掞紝榪欐槸榛樿璁劇疆銆?/span>
-l 0.0006 灝嗘渶澶у涔犵巼璁劇疆涓?6e-4銆侴PT-3 鐨勮鏂囦腑璇磋妯″瀷澶у皬搴斾嬌鐢?2e-4錛屼絾鍦ㄨ繖閲屼嬌鐢ㄤ簡涓夊€嶇殑瀛︿範鐜囷紝浼間箮璁粌閫熷害鏇村揩錛岃€屼笖娌℃湁浠諱綍闂銆傝繖榪樻病鏈夌粡榪囦粩緇嗚皟鏁淬€?/span>
-Q 0.1 琛ㄧず鍦ㄨ緇冭繃紼嬩腑錛屽皢鎶婂涔犵巼琛板噺鍒版渶澶?LR 鐨?10%錛岃繖涓?GPT-3 璁烘枃涓€鑷淬€?/span>
-u 700 琛ㄧず灝嗗湪鍓?700 嬈¤凱浠d腑灝嗗涔犵巼浠?0 鎻愬崌鍒版渶澶у涔犵巼錛屾寜鐓?GPT-3 璁烘枃鐨勮姹傦紝鍦ㄦ€繪壒嬈″ぇ灝忎負 0.5M 鏃訛紝瀛︿範鐜囦負 3.5 浜夸釜 token銆?/span>
-n 2000 瑕佹眰姣?2000 姝ヤ繚瀛樻ā鍨嬫鏌ョ偣銆?/span>
-x 32000 瑕佹眰鎬繪鏁頒負 32K 姝ャ€備箣鎵€浠ラ€夋嫨榪欎釜鏁板瓧錛屾槸鍥犱負瀹冩槸涓€涓緢濂界殑鏁板瓧錛岃€屼笖姝eソ閫傚悎 24 灝忔椂銆?/span>
-ge 1 涓?CublasLt 璁劇疆涓€涓柊榪戝悎騫剁殑 gelu 閲嶆柊璁$畻璁劇疆錛堝彲閫夛級
-y 1 璁劇疆"鎭㈠"鏍囧織銆傚鏋滀綘鐨勮緇冨洜鏁呭穿婧冩垨鎸傝搗錛屼綘鍙互 CTRL+C 騫墮噸鏂拌繍琛岃繖鏉″懡浠わ紝瀹冧細灝濊瘯鎭㈠浼樺寲銆俵lm.c 鏄?bit 紜畾鐨勶紝鎵€浠ヤ綘浼氬緱鍒頒笌娌℃湁宕╂簝鏃剁浉鍚岀殑緇撴灉銆?/span>
-e "d48" 瑕佹眰浠庡ご鍒濆鍖栦竴涓繁搴︿負 48 鐨?GPT-2 妯″瀷銆?/span>
鍐呭瓨鎸囧崡
澶у鏁頒漢鍙兘闈復鐨勬渶澶ч檺鍒舵槸浠栦滑鐨?GPU 娌℃湁 80GB 鐨勫唴瀛樸€?/span>
Andrej Karpath 琛ㄧず錛屸€?/span>娌″叧緋伙紝濡傛灉浣犳湁鑰愬績錛屼粛鐒跺彲浠ヨ繍琛屼笂榪版墍鏈夊唴瀹癸紝鍙槸閫熷害浼氬彉鎱€傛墍浠ワ紝濡傛灉妯″瀷涓嶉€傚悎錛屼綘鍙互璋冩暣浠€涔堝憿錛熸渶閲嶈鐨勬槸璋冩暣寰壒嬈″ぇ灝?-b銆?span style='text-wrap: wrap;font-family: -apple-system-font, system-ui, "Helvetica Neue", "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;letter-spacing: 1px;text-align: start;'>灝濊瘯鍑忓皬瀹冿紝浣嗚淇濇寔鏁撮綈鐨勬暟瀛椼€備緥濡傦紝浠?16 鍒?8錛屽啀鍒?4錛?錛?銆?/span>
鍦ㄦ鍩虹涓婏紝榪樺彲浠ュ皾璇曡皟鏁撮噸璁$畻璁劇疆 -r錛屽€間負 0錛堟渶蹇紝鍐呭瓨鍗犵敤澶э級銆?錛堢◢寰參涓€鐐癸紝浣嗚妭鐪佸ぇ閲忓唴瀛橈級鎴?2錛堢◢寰參涓€鐐癸紝鑺傜渷杈冨皯鍐呭瓨錛夈€?/span>
鎺ヤ笅鏉ュ彲浠ュ仛鐨勬槸紱佺敤 fp32 涓殑涓繪潈閲嶏紝浣跨敤 -w 0錛堥粯璁ゅ€間負1錛夊彲浠ュ仛鍒拌繖涓€鐐廣€傗€?/span>
Karpath 縐幫紝浠栦滑涓嶄細緇存姢 fp32 鐨勫弬鏁板壇鏈€傚湪涔嬪墠鐨勫嚑嬈¤繍琛屼腑錛岀粡楠岃〃鏄庤繖鏍峰仛鏄彲浠ョ殑錛屽彲鑳芥槸鐢變簬浠栦滑浣跨敤浜嗛殢鏈鴻垗鍏ャ€傚鏋滃嵆渚胯繖鏍蜂篃涓嶅悎閫傦紝浣犲彲浠ュ皾璇曢檷浣庢渶澶у簭鍒楅暱搴?-t錛岄粯璁ゅ€間負1024錛屽彲浠ラ檷鍒?12錛?56絳夛紝浣嗚繖浼氫嬌浣犵殑妯″瀷鍙樺緱鏇寸碂錛屽洜涓轟綘闄嶄綆浜嗗畠鐨勬渶澶ф敞鎰忚寖鍥淬€?/span>
浠g爜
鈥滄鏃犵枒闂紝鎴戝彲鑳芥湁鍋忚錛屼絾 llm.c 紜疄寰堜紭闆呪€濓紝Karpathy 琛ㄧず錛?/span>
瀹冭繍琛屽彧闇€瑕佸熀鏈殑 CUDA 渚濊禆欏廣€?/span>
瀹冩槸涓€涓洿鎺ャ€佺畝媧佷笖鍙鐨凜/CUDA瀹炵幇銆俵lm.c鎬誨叡綰︽湁5000琛孋/CUDA浠g爜銆傛垜浠敖閲忎嬌鐢–鑰屼笉鏄疌++錛屼互淇濇寔綆€鍗曘€傜緇忕綉緇滆緇冨彧鏄竴涓獁hile寰幆錛屾墽琛岀浉鍚岀殑銆佺畝鍗曠殑綆楁湳榪愮畻錛堟瘮濡?銆?銆?銆?錛夊湪涓€涓誕鐐規(guī)暟緇勪笂錛屽疄闄呬笂涓嶅簲璇ラ偅涔堝鏉傘€?/span>
瀹冪紪璇戝拰榪愯闈炲父蹇紙鍑犵閽燂級錛屾墍浠ヤ綘浼氳姳鏇村鏃墮棿鍦ㄨ皟璇曚笂錛屽噺灝戠瓑寰呮椂闂淬€?/span>
瀹冨湪寮€濮嬫椂涓€嬈℃€у垎閰嶆墍鏈夌殑GPU鍐呭瓨錛屼粠閭d互鍚庡湪璁粌榪囩▼涓唴瀛樺崰鐢ㄤ繚鎸佸畬鍏ㄤ笉鍙樸€傛墍浠ヤ竴鏃︿綘寮€濮嬭緇冿紝浣犲氨鐭ラ亾鍦ㄦ暣涓繍琛岃繃紼嬩腑涓嶄細鍑虹幇鍐呭瓨涓嶈凍鐨勯棶棰樸€?/span>
瀹冩槸 bit 綰х‘瀹氭€х殑銆?/span>
瀹冪殑鏁堢巼寰堥珮錛屾帴榪憕50%鐨勬渶澶ф誕鐐規(guī)暟榪愮畻鍒╃敤鐜囷紙MFU錛夈€?/span>
涓昏鍏ュ彛鐐瑰拰澶ч儴鍒嗕唬鐮佸湪鏂囦歡train_gpt2.cu涓€傝鏂囦歡鍖呭惈GPT-2妯″瀷瀹氫箟鍜岃緇冨驚鐜紝澶х害鏈?000琛屼唬鐮侊紝騫朵粠llmc鐩綍涓鍏ヤ簡璁稿鍖呭惈鍚勭宸ュ叿鍜屽悇灞傚疄鐜扮殑杈呭姪鏂囦歡銆俢loc llmc鎶ュ憡鏈?3涓枃浠訛紝鍏?170琛屼唬鐮侊紝鑰宑loc train_gpt2.cu鐩墠鏈?353琛屼唬鐮併€?/span>
澶氳妭鐐硅緇?/strong>
濡傛灉浣犲睘浜庢嫢鏈夊ぇ閲?GPU 鐨勪笂灞傞樁綰э紝llm.c 鏀寔澶氳妭鐐硅緇冦€?/span>
Karpathy 鍒嗕韓閬擄紝鍏?/span>涓漢鐩墠鍋氳繃鐨勬渶澶ц妯¤緇冩槸鍦?Lambda 鐨勫叏鏂頒竴閿泦緹ゅ姛鑳戒笂錛岀敤 2 涓妭鐐圭殑 16XH100 GPU 榪涜鐨勩€傝繖鏄€屽け涓氱殑鍧忓涔嬩竴銆嶏紝姣曠珶娌℃湁閽變簡銆?/span>
鍚屾椂錛屼粬榪樿閬擄紝Lambda 鍥㈤槦鎻愪緵浜嗚緇嗙殑璇存槑錛屾暀浣犲浣曞湪浠栦滑鐨勪竴閿泦緹や笂璁粌 llm.c妯″瀷銆備緥濡傦紝浣跨敤 512 涓?GPU 鐨?H100 闆嗙兢錛屾瘡灝忔椂璐圭敤涓?,300 緹庡厓錛屼綘鍙兘鍦ㄥぇ綰?0鍒嗛挓鍐呰緇冨ソGPT-2銆備綘闇€瑕佸鍔犳€繪壒閲忓ぇ灝忥紙渚嬪鍒扮害800涓囷級錛屽彲鑳借繕闇€瑕佺◢寰皟鏁磋秴鍙傛暟銆備笉榪囷紝Karpathy鑷繁娌℃湁灝濊瘯榪囷紝浣嗗叾琛ㄧず錛屻€屽畠鍙兘鍙錛岃€屼笖浼氶潪甯擱叿銆嶃€?/span>
PyTorch 姣旇緝
Karpathy 璁や負錛屼嬌鐢ㄥ叾騫惰 PyTorch 瀹炵幇錛屽湪 PyTorch 涓繘琛岀浉瀵瑰彲姣旂殑榪愯搴旇鏄繖鏍風殑錛?/span>
torchrun --standalone --nproc_per_node=8 train_gpt2.py \
--input_bin "dev/data/edu_fineweb100B/edu_fineweb_train_*.bin" \
--input_val_bin "dev/data/edu_fineweb100B/edu_fineweb_val_*.bin" \
--write_tensors 0 \
--model d48 \
--batch_size 8 --sequence_length 1024 --total_batch_size 1048576 \
--dtype bfloat16 \
--compile 1 \
--tensorcores 1 \
--flash 1 \
--num_iterations 32000 \
--warmup_iters 700 \
--weight_decay 0.1 \
--overfit_single_batch 0 \
--learning_rate 0.0006 \
--zero_stage 1
PyTorch 浠g爜浠呬緵嫻嬭瘯鍙傝€冿紝鑰岄潪瀹為檯瀹炵幇錛屽洜姝よ緇冨驚鐜湪鏌愪簺鍦版柟鐣ユ湁涓嶅悓錛堜緥濡傦紝鏁版嵁鍔犺澆鍣ㄤ笉浼氬鍒嗙墖榪涜緗崲絳夛級錛屼絾榪欎粛鍙兘浣滀負鍙傝€冪偣鏈夌敤銆備粬榪樺皢榛樿璇嶆眹澶у皬淇敼涓?50257 -> 50304 浠ユ彁楂樻晥鐜囷紝鐒跺悗褰撳墠鐨?PyTorch 澶滈棿鐗堟湰緇欏嚭錛?/span>
step 16/32000 | train loss 8.903997 | norm 8.3474 | lr 1.37e-05 | (3381.88 ms | 310057 tok/s)
step 17/32000 | train loss 8.870140 | norm 3.7936 | lr 1.46e-05 | (3381.95 ms | 310051 tok/s)
step 18/32000 | train loss 8.875732 | norm 9.4993 | lr 1.54e-05 | (3393.09 ms | 309033 tok/s)
step 19/32000 | train loss 8.817432 | norm 2.8345 | lr 1.63e-05 | (3379.75 ms | 310253 tok/s)
step 20/32000 | train loss 8.798056 | norm 4.1234 | lr 1.71e-05 | (3386.53 ms | 309631 tok/s)
step 21/32000 | train loss 8.777574 | norm 2.8010 | lr 1.80e-05 | (3386.05 ms | 309675 tok/s)
...
鐜板湪錛孉ndrej Karpathy琛ㄧず錛屸€滀笉鑳借鎴戝畬鍏ㄦ湁淇″績 PyTorch 鑴氭湰宸插緱鍒版渶澶х▼搴︾殑璋冩暣錛屼絾鍙互榪涜浠ヤ笅瑙傚療鈥濄€?/span>
PyTorch 浼間箮鍗犵敤浜嗘洿澶氬唴瀛橈紙姝ゆ榪愯綰︿負 80GB錛夛紝鑰?llm.c 鍗犵敤浜?57GB錛堟彁楂樹簡 29%錛夈€傚唴瀛樺緢閲嶈錛屽洜涓哄畠鍏佽浣犲鍔犳壒澶勭悊澶у皬錛堜緥濡傦紝llm.c 鍦ㄦ澶勫彲浠ュ鍔犲埌 24 涓井鎵瑰鐞嗭級錛岃繖鏍烽€熷害浼氭洿蹇竴浜涖€?/span>
鍏舵錛宲ytorch 姣忔榪唬澶х害涓?3386 姣錛岃€?llm.c 鍒欎負 2750 姣錛屽洜姝?llm.c 鐨勯€熷害鎻愰珮浜嗙害 19%銆備竴浜涙€ц兘鎻愬崌鐨勫師鍥犳槸宸茬煡鐨勶紝渚嬪 llm.c 鍖呭惈浜嗗儚鍚姩鍙嶅悜浼犳挱鐨?Fused classifier 涔嬬被鐨勪紭鍖栵紝鑰?Andrej Karpathy 閫忛湶錛宼orch.compile 鐩墠騫舵湭瀹炵幇榪欎竴鐐廣€備絾涔熸湁鍙兘榪欎釜鑴氭湰灝氭湭瀹屽叏璋冧紭錛屼笉榪囨棤璁哄浣曪紝Andrej Karpathy 灞曠ず榪欎釜瀵規(guī)瘮鏄負浜嗭細
1) 璁╁叾浠栦漢鍙互鏌ョ湅銆佽瘯鐢ㄣ€佹瘮杈冨拰甯姪璋冧紭錛?/span>
2) 琛ㄦ槑llm.c鍦℅PT-2/3璁粌鐨勭壒瀹氭儏鍐典笅宸茬粡鐩稿綋浼樺寲鍜屽揩閫熴€?/span>
鏈€緇堟ā鍨?/span>
main.log 鏂囦歡錛坔ttp://llmc.s3-us-west-2.amazonaws.com/gpt2_1558M/main.log錛夈€?/span>
model_00032000.bin llm.c bin 妯″瀷鏂囦歡錛坔ttp://llmc.s3-us-west-2.amazonaws.com/gpt2_1558M/model_00032000.bin錛?/span>
杞崲涓?huggingface transformers GPT-2 妯″瀷鐨勬ā鍨嬶紝宸蹭笂浼犲埌浜嗚繖閲岋細karpathy/gpt2_1558M_final2_hf錛坔ttps://huggingface.co/karpathy/gpt2_1558M_final2_hf錛夈€?/span>
鐜板湪榪樻坊鍔犱簡涓€涓粡榪?100k 錛坔ttps://huggingface.co/karpathy/gpt2_1558M_final3_hf錛夋璁粌鐨勬ā鍨嬬増鏈紝璇ユā鍨嬬殑 HellaSwag 鍊間負 57.7錛岃€岀粡榪?330K錛坔ttps://huggingface.co/karpathy/gpt2_1558M_final4_hf錛夋璁粌鐨勬ā鍨嬬殑 HellaSwag 鍊間負 62.7銆?/span>
妯″瀷瀵煎嚭
渚嬪錛屾ā鍨嬪鍑哄彲鎸夊涓嬫柟寮忚繘琛岋細
python dev/eval/export_hf.py --input log_gpt2_128M/model_00032000.bin --output gpt2_1558M_export
鐒跺悗錛屼綘灝卞彲浠ヨ繍琛?Eleuther 璇勪及宸ュ叿錛屾垨鑰呰繍琛?huggingface 閲囨牱綆¢亾鏉ヨ幏鍙栨ā鍨嬫牱鏈細
# take model for spin
import torch
output="./gpt2_1558M_final2_hf"
# set pytorch seeds
torch.manual_seed(42)
torch.cuda.manual_seed(42)
prompt="In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English."
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer=AutoTokenizer.from_pretrained(output)
model=AutoModelForCausalLM.from_pretrained(output, attn_implementation="flash_attention_2", torch_dtype=torch.bfloat16, device_map='cuda')
model.eval()
tokens=tokenizer.encode(prompt, return_tensors="pt")
tokens=tokens.to('cuda')
output=model.generate(tokens, max_new_tokens=500, pad_token_id=tokenizer.eos_token_id, do_sample=True, top_k=50, num_return_sequences=4)
samples=tokenizer.batch_decode(output)
for sample in samples:
print('-'*30)
print(sample)
400B token 榪愯
闄ゆ涔嬪錛孉ndrej Karpathy 榪樺皾璇曠敤榪滆秴 33B token 鐨勮妯℃潵璁粌 GPT-2銆傚叿浣撹€岃█錛屼粬灝?-x 鏀逛負 400,000錛屼互璁粌 420B token錛堣妯$敋鑷蟲瘮浣跨敤 300B 璁粌鐨?GPT-3 妯″瀷榪樿澶э級銆?/span>
榪欎釜妯″瀷鍦ㄨ繍琛屽埌絎?330,000 姝ラ涔嬪墠涓€鐩村緢濂斤細
璇ユā鍨嬪湪 HellaSwag 涓婂ぇ澶ц秴瓚婁簡鍚岀瓑澶у皬鐨?GPT-2 鍜?GPT-3錛堟渶楂樺彲杈劇害 61%錛夛紝浣嗛仐鎲劇殑鏄紝浠庨偅鏃惰搗瀹冨氨鍙樺緱涓嶇ǔ瀹氬茍鍑虹幇浜嗛棶棰樸€?/span>
鍦ㄨ繖涓繃紼嬩腑錛屾湁鏇村杈冨皬鐨勫嘲鍊鹼紝浣?Karpathy 灝嗕唬鐮侀厤緗敼涓哄綋媯€嫻嬪埌鐬椂涓嶇ǔ瀹氭椂璺寵繃鏇存柊錛堝叾涓嬌鐢ㄤ簡 -sl 5.0 -sg 5.0 鏍囪錛夛紝榪欐湁鍔╀簬緙撹В鍜屾帹榪熼棶棰樸€?/span>
瀵規(guī)錛?span style='font-family: -apple-system-font, system-ui, "Helvetica Neue", "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;letter-spacing: 1px;text-align: start;text-wrap: wrap;background-color: rgb(255, 255, 255);'>Karpathy 璁や負鍏跺鍒濆鍖栥€佹縺媧昏寖鍥村拰鏁翠綋妯″瀷璁粌紼沖畾鎬ц繕涓嶅璋ㄦ厧錛屽茍涓斿瓨鍦ㄦ洿娣卞眰嬈$殑闂錛岃繖浜涢棶棰樹細閫愭笎浣挎ā鍨嬮櫡鍏ヤ笉紼沖畾鐘舵€侊紝灝ゅ叾鏄浜庤緝澶х殑妯″瀷鍜岄暱鏃墮棿鐨勮緇冦€傝繖涔熸槸浠栦滑鏈潵鎯寵榪涗竴姝ョ爺絀跺拰鎺㈣鐨勫湴鏂廣€?/span>
浠ヤ笂錛屼究鏄?Karpathy 姝ゆ瀹炶返鐨勬暣涓粡榪囥€?/span>
浜哄伐鏅鴻兘璁粌涓嶄細瓚婃潵瓚婁究瀹?/span>
涓嶈繃錛屼篃鏈変漢璁や負紜歡銆佽蔣浠跺拰璁粌鏁版嵁鐨勮繘姝ュ茍涓嶆剰鍛崇潃灝栫鐨?AI 璁粌浼氳秺鏉ヨ秺渚垮疁銆?/span>
Anthropic 鍏徃棣栧腑鎵ц瀹?Dario Amodei 琛ㄧず錛岀洰鍓嶆鍦ㄨ緇冪殑浜哄伐鏅鴻兘妯″瀷宸茬粡鑰楄祫 10 浜跨編鍏冿紝鑰屾洿鏄傝吹鐨勬ā鍨嬪湪 2025 騫村氨浼氳揪鍒?1000 浜跨編鍏冦€?/span>
榪欐槸鍥犱負铏界劧紜歡鎬ц兘瓚婃潵瓚婂己澶э紝浣嗕環(huán)鏍間篃瓚婃潵瓚婃槀璐點€備緥濡傦紝NVIDIA H100 鐩墠姣忓彴鍞環(huán) 4 涓囩編鍏冦€傚敖綆″姝わ紝涓嬩竴浠?Blackwell AI 鑺墖鐨勫敭浠烽璁″皢杈懼埌 7 涓囩編鍏冿紝闄ら潪鎴戜滑鑳芥壘鍒板儚 Sohu AI 鑺墖錛堜笓涓哄彉鍘嬪櫒璁捐鐨?ASIC錛夎繖鏍風殑紜歡紿佺牬錛屽惁鍒欎竴涓畬鏁寸殑鏈嶅姟鍣ㄦ満鏋剁殑鍞環(huán)灝嗚揪鍒?300 涓囩編鍏冪敋鑷蟲洿楂樸€?/span>
闄や簡鎴愭湰鏂歸潰鐨勫獎鍝嶏紝AI 鏁版嵁涓績鏃ョ泭澧為暱鐨勭數(shù)鍔涢渶姹備篃寮€濮嬪紩璧蜂竴浜涗笓瀹剁殑鍏蟲敞銆備粎涓€鍧?H100 鑺墖錛屼互騫沖潎 61% 鐨勫勾鍒╃敤鐜囪繍琛岋紝姣忓勾灝變細娑堣€?3.7 鍏嗙摝鏃剁殑鐢?shù)鍔涖€備粎浠ュ鉤鍧?61% 鐨勫勾鍒╃敤鐜囪繍琛岀殑涓€涓?H100 鑺墖姣忓勾灝辮娑堣€?3.7 鍏嗙摝鏃剁殑鐢?shù)鍔涖€傚幓騫達紝Nvidia 鍜屽叾浠栨墍鏈夊弬涓庤€呭叡鍞嚭瓚呰繃 380 涓囧彴 AI GPU錛岀浉褰撲簬姣忓勾 14.3 TWh 鐨勭數(shù)鍔涳紝瓚充互涓?130 涓囦釜鏅€氱編鍥藉搴緵鐢點€?/span>
浣嗗嵆浣挎姇鍏ュぇ閲忚祫閲戝拰綺懼姏鍦?AI 涓婏紝璋鋒瓕 DeepMind 棣栧腑鎵ц瀹樿〃紺猴紝鐩墠鐨勬ā鍨嬩粛鐒跺彧澶勪簬鐚殑鏅哄晢姘村鉤銆傚洜姝わ紝鎴戜滑浠嶇劧闇€瑕佷負鏈潵鐨勬ā鍨嬪啀鎶曡祫鏁板崄浜跨編鍏冦€備絾鏄紝濡傛灉浣犳兂灝濊瘯浣跨敤鏃фā鍨嬫瀯寤鴻嚜宸辯殑 LLM錛岄€氳繃 Karpathy鐨勬柟娉曪紝鍙鍑犵櫨緹庡厓?yōu)澶熶簡銆?/span>
鏉ユ簮錛?/span>
https://github.com/karpathy/llm.c/discussions/677
https://www.tomshardware.com/tech-industry/artificial-intelligence/former-tesla-ai-director-reproduces-gpt-2-in-24-hours-for-only-672
鐐庣値澶忔棩馃敟錛孉I 縐戞妧澶ф湰钀ラ€佹竻鍑夌鍒╋紒