用Lisp教计算机写作
这篇文章最初以中文撰写并在CSDN上发布,链接为https://blog.csdn.net/lzw_java/article/details/11599993
大多数代码和想法都基于《Ansi Common Lisp》P138~P141。
问题:给定一个英文文本,计算机如何基于它生成随机但可读的文本?例如:
国家风险投资协会估计与交易相关的财富是由大量的法律法规造成的,这些法规在某些项目中会花很多钱,这些项目之所以存在的主要原因就是这些风险。
这是一段计算机在学习了保罗·格雷厄姆的一些文章后生成的随机文本。它从单词“Venture”开始,延伸成了一句话。令人惊讶的是,文本通常是可读的。
算法:记录每个单词后出现的单词及其出现的次数。例如,如果“我离开”在原始文本中出现了5次,“我要”出现了3次,并且“我”在任何其他单词之前都没有出现,那么在生成随机文本时,遇到“我”时有5/8的概率选择“leave”作为下一个单词。如果选择了“leave”,那么检查哪些单词在“leave”之后出现,并重复该过程。
现在,让我们使用Lisp来解决这个问题。
Lisp的符号类型可以很好地记录各种字符串和标点符号,因此我们将使用它来记录。我们将使用内置的哈希表来创建一个列表:
(defparameter *words* (make-hash-table :size 10000))
我们如何创建这个列表呢?
(let ((prev '|.|))
(defun see (sym)
(let ((pair (assoc sym (gethash prev *words*))))
(if pair
(incf (cdr pair))
(push (cons sym 1) (gethash prev *words*))))
(setf prev sym)))
当前单词是关键字,关联列表是该关键字下的值。例如,在“我”下面我们有( ( | leave | . 5) ( | want | . 3) )。如果单词不存在,则将 (word . 1) 推入。 |
我们如何随机选择一个单词?
(defun random-word (word ht)
(let* ((choices (gethash word ht))
(x (random (reduce #'+ choices :key #'cdr))))
(dolist (pair choices)
(decf x (cdr pair))
(if (minusp x)
(return (car pair))))))
这里,reduce函数被巧妙地使用了。
现在,让我们考虑如何把给定的单词在两边扩展成一个句子?
1) 将文本反转以获得反转列表,即“我离开,我要”变成“leave I,want I”。 2) 反转哈希表以获得另一个哈希表,其中后面的单词是键,可能的前面的单词及其计数形成关联列表。 3) 试图运气,从标点符号开始扩展句子,直到出现给定的单词。
我使用了第二种方法:
(defparameter *r-words* (make-hash-table :size 10000))
(defun push-words (w1 w2 n)
(push (cons w2 n) (gethash w1 *r-words*)))
(defun get-reversed-words () ; a cat -> cat a
(maphash #'(lambda (k lst)
(dolist (pair lst)
(push-words (car pair) k (cdr pair))))
*words*))
遍历原始哈希表,然后将每对单词插入另一个哈希表,并颠倒它们的顺序。以下是自动生成双向扩展句子的代码:
(defparameter *words* (make-hash-table :size 10000))
(defconstant maxword 100)
(defparameter nwords 0)
(defconstant debug nil)
(let ((prev '|.|))
(defun see (sym)
(incf nwords)
(let ((pair (assoc sym (gethash prev *words*))))
(if pair
(incf (cdr pair))
(push (cons sym 1) (gethash prev *words*))))
(setf prev sym)))
(defun check-punc (c) ; char to symbol
(case c
(#\. '|.|) (#\, '|,|)
(#\; '|;|) (#\? '|?|)
(#\: '|:|) (#\! '|!|)))
(defun read-text (pathname)
(with-open-file (str pathname :direction :input)
(let ((buf (make-string maxword))
(pos 0))
(do ((c (read-char str nil 'eof)
(read-char str nil 'eof)))
((eql c 'eof))
(if (or (alpha-char-p c)
(eql c #\'))
(progn
(setf (char buf pos) c)
(incf pos))
(progn
(unless (zerop pos)
(see (intern (subseq buf 0 pos)))
(setf pos 0))
(let ((punc (check-punc c)))
(if punc
(see punc)))))))))
(defun print-ht (ht)
(maphash #'(lambda (k v)
(format t "~A ~A~%" k v))
ht))
(defparameter *r-words* (make-hash-table :size 10000))
(defun push-words (w1 w2 n)
(push (cons w2 n) (gethash w1 *r-words*)))
(defun get-reversed-words () ; a cat -> cat a
(maphash #'(lambda (k lst)
(dolist (pair lst)
(push-words (car pair) k (cdr pair))))
*words*))
(defun print-a-word (word ht)
(maphash #'(lambda (k lst)
(if (eql k word)
(format t "~A ~A~%" k lst)))
ht))
(if debug
(print-a-word '|leave| *r-words*))
(defun punc-p (sym) ; symbol to char, nil when fails.
(check-punc (char (symbol-name sym) 0)))
(defun random-word (word ht)
(let* ((choices (gethash word ht))
(x (random (reduce #'+ choices :key #'cdr))))
(dolist (pair choices)
(decf x (cdr pair))
(if (minusp x)
(return (car pair))))))
(defun gen-former (word str)
(let ((last (random-word word *r-words*)))
(if (not (punc-p last))
(progn
(gen-former last str)
(format str "~A " last)))))
(defun gen-latter (word str)
(let ((next (random-word word *words*)))
(format str "~A " next)
(if (not (punc-p next))
(gen-latter next str))))
;(gen-latter '|leave| t)
(defun get-a-word (ht) ; get a random word
(let ((x (random nwords)))
(maphash #'(lambda (k v)
(dolist (pair v)
(decf x (cdr pair))
(if (minusp x)
(return-from get-a-word (car pair)))))
ht)))
;(get-a-word *words*)
(defun gen-sentence (word str)
(gen-former word str)
(format str "~A " word)
(gen-latter word str))
(defun test ()
(setf nwords 0)
(read-text "essay.txt")
(get-reversed-words)
(let ((word (get-a-word *words*)))
(print word)
(gen-sentence word t)))
(test)