以前、記載した上記entryでは、.java を parse しましたが、今回は、.jsp の parse .
*.jsp 全体? の BNF記法(再帰下降)をする場合、手間が増える為、 今回は、html.parser でパースし、その後、残った 「<% ~ %>」のscriptlet部分のみを Lark for python でパースします。
ただ、何となく、動作することは確認しましたが、細かくは動作確認していません。
参考url
Pythonの構文解析ライブラリLarkを使って遊んでみました | Tricorn Tech Labs
EBNF notation for HTML5 tag syntax · GitHub
Python 構文解析ライブラリLarkを使って簡単な自作言語処理系をつくる - Qiita
Welcome to Lark’s documentation! — Lark documentation
Step1 - 文法 - jsp_grammar_.lark ファイル
jsp_tag : "<%" scriptlet "%>" scriptlet : "--" ws* comment_text "--" | "@" ws* attr_list | "=" ws* expression_text | "!" ws* declare_text | ws* script_text attr_list : attr (ws+ attr)* attr : attr_name | attr_0_quoted | attr_1_quoted | attr_2_quoted attr_name : /[^\s"'\<\>\/\=%]+/ attr_0_quoted : attr_name ws* "=" ws* attr_0_quoted_val attr_1_quoted : attr_name ws* "=" ws* "'" attr_1_quoted_val "'" attr_2_quoted : attr_name ws* "=" ws* "\"" attr_2_quoted_val "\"" attr_0_quoted_val : /[^\s"'\=\<\>%]+/ attr_1_quoted_val : /[^\'%]+/ attr_2_quoted_val : /[^\"%]+/ directive_text : tag_text expression_text : tag_text declare_text : tag_text script_text : tag_text comment_text : /.[^-]+/ tag_text : /.[^%]+/ ws : /\s/
Step2 - parse実行するpython script - parse_jsp.py
# -*- coding: utf-8 -*- import sys from lark import Lark from lark import Transformer from functools import reduce def main(): args = sys.argv text = args[1] with open("./jsp_grammar.lark", encoding="utf-8") as grammar: parser = Lark(grammar.read(), parser='earley', # parser='lalr', start="jsp_tag") tree = parser.parse(text) result = CalcTransformer().transform(tree) # print(result) class CalcTransformer(Transformer): def attr_name(self, tree): func_name = sys._getframe().f_code.co_name print(func_name,tree[0]) def attr_0_quoted_val(self, tree): func_name = sys._getframe().f_code.co_name print(func_name,tree[0]) def attr_1_quoted_val(self, tree): func_name = sys._getframe().f_code.co_name print(func_name,tree[0]) def attr_2_quoted_val(self, tree): func_name = sys._getframe().f_code.co_name print(func_name,tree[0]) def comment_text(self, tree): func_name = sys._getframe().f_code.co_name print(func_name,tree[0]) def directive_text(self, tree): func_name = sys._getframe().f_code.co_name print(func_name,tree[0]) def expression_text(self, tree): func_name = sys._getframe().f_code.co_name print(func_name,tree[0]) def declare_text(self, tree): func_name = sys._getframe().f_code.co_name print(func_name,tree[0]) def script_text(self, tree): func_name = sys._getframe().f_code.co_name print(func_name,tree[0]) if __name__ == '__main__': main()
Step3 - 素朴なtest
$ /usr/local/python3/bin/python3 ./parse_jsp_0.py \ '<%-- ************************** --%>' comment_text **************************
$ /usr/local/python3/bin/python3 ./parse_jsp_0.py '<%@page import="jp.end0tknr.common.CommonConst"%>' attr_name page attr_name import attr_2_quoted_val jp.end0tknr.common.CommonConst
$ /usr/local/python3/bin/python3 ./parse_jsp_0.py \ '<%=CommonConst.HEADER_LINK_REGIST %>' expression_text Tree('tag_text', [Token('__ANON_8', 'CommonConst.HEADER_LINK_REGIST ')])
$ /usr/local/python3/bin/python3 ./parse_jsp_0.py '<%@ page language="java" contentType="text/html; charset=UTF-8" pageEncoding="UTF-8"%>' attr_name page attr_name language attr_2_quoted_val java attr_name contentType attr_2_quoted_val text/html; charset=UTF-8 attr_name pageEncoding attr_2_quoted_val UTF-8
$ /usr/local/python3/bin/python3 ./parse_jsp_0.py \ '<%@taglib prefix="c" uri="http://java.sun.com/jsp/jstl/core"%>' attr_name taglib attr_name prefix attr_2_quoted_val c attr_name uri attr_2_quoted_val http://java.sun.com/jsp/jstl/core