EntityIdentification Identify whether two sets of data are from the same entity. 识别两组数据是否来自同一主体
This is a MCP (Model Context Protocol) server. 这是一个支持MCP协议的服务器。
Data Comparison Tool
This tool provides a comprehensive way to compare two sets of data, evaluating both exact and semantic equality of their values. It leverages text normalization and a language model to determine if the data originates from the same entity.
Features
- Text Normalization: Converts text to lowercase, removes punctuation, and normalizes whitespace.
- Value Comparison: Compares values directly and semantically (ignoring order for lists).
- JSON Traversal: Iterates through each key in the JSON objects and compares corresponding values.
- Language Model Integration: Uses a generative language model to assess semantic similarity and provide a final judgment on whether the data comes from the same entity.
Installation
To use this tool, ensure you have the necessary dependencies installed. You can install them using pip:
bash pip install genai
Usage
Functions
- normalize_text(text):
* Normalizes the input text by converting it to lowercase, removing punctuation, and normalizing whitespace.
- compare_values(val1, val2):
* Compares two values both exactly and semantically.
* If the values are lists, it ignores the order of elements for semantic comparison.
- compare_json(json1, json2):
* Compares two JSON objects key by key.
* Uses `compare_values` to evaluate each key's values.
* Integrates a language model to assess semantic similarity and provides a final judgment.
Example
# Define your JSON objects
json1 = { "name": "John Doe", "address": "123 Main St, Anytown, USA", "hobbies": ["reading", "hiking", "coding"] }
json2 = { "name": "john doe", "address": "123 Main Street, Anytown, USA", "hobbies": ["coding", "hiking", "reading"] }
# Compare the JSON objects
comparison_results = compare_json(json1, json2)
# Generate final matching result
model1 = genai.GenerativeModel("gemini-2.0-flash-thinking-exp") result_matching = model1.generate_content("综合这些信息,你认为可以判断两个数据来自同一主体吗?"+json.dumps(comparison_results, ensure_ascii=False, indent=4)) print(result_matching.text) ```
## Contributing
Contributions are welcome! Please open an issue or submit a pull request.
## License
This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.
## Contact
If you have any questions or suggestions, please contact me:
* Email: [[email protected]](/cdn-cgi/l/email-protection)
* GitHub: [[[email protected]](/cdn-cgi/l/email-protection)](mailto:[[email protected]](/cdn-cgi/l/email-protection))。
Wechat 