Project Stage 3: Enhancing GCC Pass for Multiple Function Clone Handling: Progress Update
In my earlier work, I developed a basic GCC pass for identifying and comparing function clones. For this latest phase, I focused on improving the pass to handle programs containing multiple cloned functions. Additionally, I wanted to enhance code organization by consolidating data storage, replacing multiple parallel vectors and maps with a structured approach using a dedicated struct to store characteristics of clone variants.
Implementation Process
Phase 1: Code Revision for Multi-Function Support
I modified the pass to properly handle programs with multiple cloned functions in x86 server environments. The updated code includes all critical support files.
The revised implementation's core is a specialized pass that processes GIMPLE representation data. This pass systematically identifies resolver functions and creates appropriate tracking structures for each function being cloned. The implementation uses maps and vectors to store function characteristics, including basic block counts, statement counts, GIMPLE codes, operand counts, and variable relationships.
When processing functions, the pass first checks if the function is cloned, then either records its characteristics (if it's the first variant encountered) or compares it against previously recorded data to determine if it can be pruned.
/* Test Pass Kavya Shah, Seneca Polytechnic College Student ID :140055229 Modelled on tree-nrv.cc and tree-ctyler.cc by Chris Tyler, Seneca Polytechnic College, 2024-11 This file is part of GCC. GCC is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3, or (at your option) any later version. GCC is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with GCC; see the file COPYING3. If not see <http://www.gnu.org/licenses/>. */ #include <map> #include <vector> #include <string> #include <stdlib.h> #include "config.h" #include "system.h" #include "coretypes.h" #include "backend.h" #include "tree.h" #include "gimple.h" #include "tree-pass.h" #include "ssa.h" #include "tree-pretty-print.h" #include "gimple-iterator.h" #include "gimple-walk.h" #include "internal-fn.h" #include "gimple-pretty-print.h" #include "cgraph.h" #include "gimple-ssa.h" #include "attribs.h" #include "pretty-print.h" #include "tree-inline.h" #include "intl.h" #include "function.h" #include "basic-block.h" #include "gcc-plugin.h" namespace { const pass_data pass_data_kbshah6 = { GIMPLE_PASS, /* type */ "kbshah6", /* name */ OPTGROUP_NONE, /* optinfo_flags */ TV_NONE, /* tv_id */ PROP_cfg, /* properties_required */ 0, /* properties_provided */ 0, /* properties_destroyed */ 0, /* todo_flags_start */ 0, /* todo_flags_finish */ }; struct gimple_data{ int m_bb_count = 0; int m_stmt_count = 0; std::vector<enum gimple_code> m_gimple_code; std::vector<unsigned> m_operands_count; std::vector<enum tree_code> m_operandsType; std::map<int,int> m_variableMap; }; class pass_kbshah6 : public gimple_opt_pass { public: pass_kbshah6 (gcc::context *ctxt) : gimple_opt_pass (pass_data_kbshah6, ctxt) {} /* opt_pass methods: */ bool gate (function *) final override { return 1; } unsigned int execute (function *) final override; }; // class pass_kbshah6 int process_gimple_variables(std::map<tree, int>& refMap, int index, gimple* stmt, std::map<int, int>& resultMap) { std::vector<tree> vars; switch (gimple_code(stmt)) { case GIMPLE_ASSIGN: { tree lhs = gimple_assign_lhs(stmt); vars.push_back(lhs); int rhs_ops = gimple_num_ops(stmt) - 1; for (int i = 1; i <= rhs_ops; ++i) { tree op = gimple_op(stmt, i); vars.push_back(op); } break; } case GIMPLE_DEBUG_BIND: { tree var = gimple_debug_bind_get_var(stmt); tree val = gimple_debug_bind_get_value(stmt); if (var) vars.push_back(var); if (val) vars.push_back(val); break; } case GIMPLE_COND: { tree lhs = gimple_cond_lhs(stmt); tree rhs = gimple_cond_rhs(stmt); if (lhs) vars.push_back(lhs); if (rhs) vars.push_back(rhs); break; } default: break; } for (tree var : vars) { std::map<tree, int>::iterator it = refMap.find(var); if (it != refMap.end()) { resultMap[index] = it->second; } else { refMap[var] = index; resultMap[index] = index; } ++index; } return index; } unsigned int pass_kbshah6::execute(function *func) { static bool first = true; static std::map<std::string, gimple_data> function_data_map; struct cgraph_node *node; std::string base_function; if (!dump_file ||!func || !func ->cfg) return 0; if (first){ FOR_EACH_DEFINED_FUNCTION(node) { const char *fname = node->name(); if (!fname) continue; //ignore function with no name std::string name(fname); if ( name.find(".resolver") != std::string::npos) { std::string base_function_name; base_function_name = name.substr(0, name.find(".resolver")); function_data_map[base_function_name]= gimple_data(); fprintf(dump_file, "Found resolver of function: %s\n", base_function_name.c_str()); } } first = false; } node = cgraph_node::get(func->decl); if (!node){ fprintf(dump_file, "ERROR: cgraph_node::get(func->decl)\n"); return 0; } const char *fname = node->name(); std::string name(fname); if ( name.find(".resolver") !=std::string::npos) return 0; base_function = name.substr(0, name.find(".")); if (function_data_map.find(base_function) == function_data_map.end()){ return 0; //this function is not cloned. } gimple_data &reference = function_data_map[base_function]; gimple_data current; std::map<tree, int> refMap; basic_block bb; int index = 0; FOR_EACH_BB_FN(bb, func) { current.m_bb_count++; for (gimple_stmt_iterator gsi = gsi_start_bb(bb); !gsi_end_p(gsi); gsi_next(&gsi)){ current.m_stmt_count++; } } if (reference.m_bb_count == 0 && reference.m_stmt_count == 0){ fprintf(dump_file,"recording-------------\n"); FOR_EACH_BB_FN(bb,func){ for (gimple_stmt_iterator gsi = gsi_start_bb(bb); !gsi_end_p(gsi); gsi_next(&gsi)){ gimple *stmt = gsi_stmt(gsi); index = process_gimple_variables(refMap, index, stmt, current.m_variableMap); current.m_gimple_code.push_back(gimple_code(stmt)); current.m_operands_count.push_back(gimple_num_ops(stmt)); if (is_gimple_assign(stmt)) current.m_operandsType.push_back(gimple_assign_rhs_code(stmt)); } } function_data_map[base_function] = current; } else { fprintf(dump_file, "Comparing-------------------\n"); if (current.m_bb_count != reference.m_bb_count || current.m_stmt_count != reference.m_stmt_count){ fprintf(dump_file, " NOPRUNE: %s\n",name.c_str()); return 0; } std::vector<enum gimple_code>::iterator it = reference.m_gimple_code.begin(); std::vector<unsigned>::iterator count_it = reference.m_operands_count.begin(); std::vector<enum tree_code>::iterator optype_it = reference.m_operandsType.begin(); std::map<int,int> var2_variableMap; index = 0; FOR_EACH_BB_FN(bb, func){ for (gimple_stmt_iterator gsi = gsi_start_bb(bb); !gsi_end_p(gsi); gsi_next(&gsi)){ gimple *stmt = gsi_stmt(gsi); index = process_gimple_variables(refMap, index, stmt, var2_variableMap); if (gimple_code(stmt) != *it ||gimple_num_ops(stmt) != *count_it) { fprintf(dump_file, " NOPRUNE: %s\n", name.c_str()); return 0; } if( *it == GIMPLE_ASSIGN && *(optype_it++) != gimple_assign_rhs_code(stmt)) { fprintf(dump_file, " NOPRUNE: %s\n",name.c_str()); return 0; } ++it; ++count_it; } } std::map<int, int>::const_iterator it1 = reference.m_variableMap.begin(); std::map<int, int>::const_iterator it2 = var2_variableMap.begin(); while (it1 != reference.m_variableMap.end()) { if (it1->first != it2->first || it1->second != it2->second) { fprintf(dump_file, " NOPRUNE: %s\n",name.c_str()); return 0; } ++it1; ++it2; } fprintf(dump_file, " PRUNE: %s\n",name.c_str()); } return 0; } } // anon namespace gimple_opt_pass * make_pass_kbshah6 (gcc::context *ctxt) { return new pass_kbshah6 (ctxt); }
Phase 2: Testing with Multiple Function Types
To validate the implementation, I expanded my test cases to include a reliable PRUNE function called "print_median" alongside existing functions. This gave me test scenarios with multiple cloned functions—some always unable, others conditionally prunable, depending on context.
The testing confirmed that the pass correctly identified both PRUNE and NOPRUNE functions as expected. When running with one prunable and one non-prunable function, each was correctly labeled. Similarly, when testing with two prunable functions, both were properly marked as PRUNE.
CLONE_ATTRIBUTE void print_median(int16_t *buff, size_t samples) { printf("median: %d\n", buff[samples / 2]); }
Phase 3: Cross-Platform Validation
I replicated the test environment on an AArch64 server to verify cross-platform compatibility. The tests yielded identical results to the x86 implementation, confirming that the pass works consistently across architectures.
Reflections on Implementation
Limitations and Capabilities
The current implementation has several notable characteristics:
- Works exclusively with GIMPLE intermediate representation for C programs
- Analyzes but doesn't modify or optimize code
- Processes all functions regardless of whether they're called
- Compares clones based on basic block count, statement count, and statement types
- Handles assignment operations with special attention to right-hand side operands
- Limited to processing GIMPLE_ASSIGN, GIMPLE_COND, and GIMPLE_DEBUG statement types
- Marks identical clones as PRUNE and different ones as NOPRUNE
My testing covered various scenarios, including single PRUNE functions, single NOPRUNE functions, mixed PRUNE/NOPRUNE combinations, and multiple PRUNE functions across both x86 and AArch64 environments.
Learning Experience
Stage 3 allowed me to address the organizational issues I encountered in Stage 2, where time constraints had forced me to implement features incrementally. By redesigning the code to use structured data storage, I've improved both readability and maintainability.
This project phase also allowed me to develop a deeper understanding of test case creation for function cloning. The challenges I faced—including poor GCC pass documentation, lengthy compile times, complex testing procedures, and the mixed C/C++ codebase—have given me a newfound appreciation for the work GCC developers undertake to optimize compilation processes.
Comments
Post a Comment