We add an parseArgs interface of difftest for style and GatewayConfig.
User can pass DifftestArgs from top module which extends App trait, and
it will return filter args for subsequent parsing.
Now, we support use "--difftest-config ..." to specify GatewayConfig, each
letter represents an optimization measure. Corresponding change for NutShell/XiangShan will be added with next time bump difftest.
When running with --workload-lilst, simv_init is reinitialized multiple times in a single launch, and the pointer from the last simv_init request needs to be destroyed before doing so.
This commit fixed an issue with incomplete memory deallocation when exiting the program, and release the list file correctly when using the workload list
We add SYNTHESIS macros around generated DPIC funcs, and TB_NO_DPIC macros around DPIC funcs in fixed vsrc of TB.
With this change, difftest with non-default GatewayConfig can run in both SYNTHESIS or not.
* Configurable batchByteLen:
For better performace without GFIFO. In palladium, we can pass maximum 4000 byte in Gfifo Func (enable isNonBlock), and 8000 byte in Other Func. With configurable byteLen, speed will increase from 70 K to 98 K without GFIFO.
* Interval step: pack Data transmit and Step control
Previous when batch is not supported, we use multiple DPIC func to transmit DiffState. Since the DPIC order in one beat is uncertain, we use the step to mark the transmission is finished.
With batch mode, data will be transmit at single func, so we can also pack step to reduce sync times. It can reduce DPIC calls to half. Speed will increase from 95 K to 98 K without GFIFO, 26 K to 27.5 K with GFIFO.
When IntervalStep is enable, step in TOP IO will be default 0. We hold this unused signal for fixed TOP IO interface.
By the way, when the feature is supported. DUT will only use a single port to interface with REF. This helps us to migrate Difftest for other platform.
Support --workload-list option for vcs simulation framework. By add running args like `--workload-list=<listfile_name>`,user can run several workload in a single launch on VCS or Palladium. This feature will greatly reduce time required to run many short workloads. Each line of list file should `workload-name max-instr-limit`.
To support this feature, we add workload-switch logic. When a workload is done (or exceed max-instr-limit),we will raise workload-switch signal according to simv_result. Then the reset signal will be set to high. Other logic will only rely on reset logic, so we can see each workload independently.
Note: Now DUT memory has two implementation schemes: load memory to Hardware memory when initial, or load memory in software, then read/write by DPIC. Currently the first scheme cannot support reload memory when reset again. So we must ensure using DPIC memory.
The whole progress of handling workload-list is like follows:
1. When initial, user pass workload-list.
2. When `reset` is set to high. User will trigger simv_init, it will get workload and max-instr from workload-list, just like only a workload is run. Software and DUT memory(DPIC) will be inited in this.
3. When `simv_nstep` is ended successfully. `workload-switch` will set `reset` to high. Software should be free by difftest_finish. Then Back to Point 2.
4. When `simv_init` gets no more workload from list. The whole simulation is ended.
* CI: add clang-format for github actions
We add clang-format to ensure consistent Cpp code style. When some code is no need to formatter, use clang-format off/on to control it.
Now our clang-format config is based on LLVM style, which can be checked by clang-format -style=LLVM --dump-config.
All possible options and their meanings can be searched in https://clang.llvm.org/docs/ClangFormatStyleOptions.html
After this PR, github actions will check format automatically. When the code need format, CI will fail, user need to make format manully and commit again.
Previous we use naive memcpy for NEMU memory snapshot. Now we complete more efficient memory snapshot and recovery with store log.
Call ref_store_log_reset and set_store_log, then we can record origin data of every pmem_write for this execution. When REF and DUT comparation trigger error, call ref_store_log_restore, then memory will recover by origin data in reverse order. Note we restore REF and Golden Memory, not DUT memory.
ENABLE_STORE_LOG macro added to default config. This commit should be bumped with ready-to-run(latest NEMU and SPIKE so)
We add perfcnt to evalute the effects of different optimization of Gateway. Now we cnt Dpic calls, bytes, run time and failed cycle. Speed and Gates will be reported by Palladium.
Support display trap_PC and IPC for vcs
CI used to have a spelling mistake of diffStateSelect, which cause
sed incorrectly. Now we rename it to hasDutZone directly, means we
have different zones of DUT_buffer, each zone may have 1 or batchSize
space.
We clearify inherit and override of DPIC. DPIC/DPICBatch should override
modPorts corresponding to input format. CommonPorts(clock, enable) will
be filtered from DPICargs, other ports will be used in DPICAssigns
Now we release related palladium scripts and user should set PLDM_HOST
for internal information.
We use wildcard for boards occupation in compilerOptions.qel, it can
automatically choose approprate boards count. User can also set it
manully like {boards 0+1.1+1.2}, which means occupy 1 board and 2 extra
domains. It can be estimated by gate count in xe.msg.
* Batch: pack DPIC to reduce sync times
We pack DPIC of DifftestBundles on Hardware and unpack it in Software. So tens of DPIC called times will be reduced to one, which reduce sync times in platform such as palladium.
It can speed up full-difftest of XiangShan from 0.23MHz to 0.29MHz with GlobalEnable,Squash and Gfifo(Non-block) in Palladium. Without Gfifo, it can speed up from 0.025 MHz to 0.11MHz.
Difftest cycle delayed for DifftestBundles to collect valid bundles in bundleNum cycles, which cause delayed comparision result.
BatchInterval and BatchFinish are appended to infoVec to seperate DPIC packed for different Cycle. Other elements contains BundleType and BundleLen to help unpack DataVec.
Now we set maximum data byte packed as 3900, and maximum info byte packed as 90, because Palladium limits total bytes of gfifo param should less than 4000.
We add --max-num-width verilator args to increase bit width limits. See related issue verilator/verilator#2082
For misalignment of struct member, we add packed attribute to struct declaration so all member will be aligned by byte, which helps use memcpy for DPIC.
Note: When run full-difftest in XiangShan with Batch mode, it is possible to contains only one cycle DPICs each time for larger possible maximum bytes each cycle. So diffStateSelect should also be open for VCS and Palladium.
We reshape GatewaySink for same control interface and different IO interface corresponding to DufftestBundle(single-pack) and batchIO(Batch). DPIC for different io interface should extend DPICBase for general methods.
For simv, we delay step to ensure DPIC_step behind DPIC_transfer.
For emu, when signal enable is high on current cycle, it will be read
by Software, however DPIC will be called next cycle because verilog
will use previous cycle signals as always block condition.
Such problem is not exposed when step size is fixed, so emu used to step
before DPIC. Now we move delay logic to Gateway, so it can be shared
by both emu and simv.
DPI-C functions are using the diffstate_buffer to check whether
the corresponding data structures have been initialized. However,
we are not resetting it to nullptr after free. If the simulation
environment is reset manually and restarted, then this pointer
will be a dangling pointer and corrupt the next DPI-C calls.
For Param configuration in fixed code such as vcs/top.v, we include
DifftestMacros.v to contain macros corresponding to Param.
However, in generated RTL, we can add Macro directly in code generation.
And when pure verilog is generated, DifftestMacros may not be created,
which cuase syntax error.
* We seperate gateway mixed logic to PreProcess, Squash, Batch,
GatewaySink and ZoneControl
* We add zone-related logic. DUT will write to different zones of
State_Buffer at different clock. And REF will also read different
zone at each nstep. It will help with async Read and Write. Also used
when step at next cycle may be lag behind transfer.
* Each zone's len will be batchSize when isBatch, default is 1.
* Transfer Module such as DPIC should declare GatewayBundle for same IO
interface. And batch will has specific Interface, declare
GatewayBatchBundle.
* coverage: fix the regex for module definition
It now allows `module module_name (` as well.
We are not providing general-purpose Verilog parsing scripts. However,
we need to at lease support Chisel-generated sources using DiffTest.
* Disable Verilator coverage in LogPerfControl
We wrap the gateway return values in a case class for better code
readability and code structure.
The results of the gateway consist of:
* C++ macros
* Verilog macros
* Extra difftest bundles
* DiffTest step
* Add apply, toSeq, and names methods to ArchInt/ArchFp/CSR State
* Add === and =/= to ArchInt/ArchFp/CSR State
* Add the default numPhyRegs value to InstrCommit
Since https://github.com/chipsalliance/chisel/pull/3753, the users
are able to check whether a Data is visible in the current context.
To avoid a large number of duplicated LogPerfControl modules, we
allow it to be reused when some previous instance is still visible
to the current context.
Call `LogPerfControl.reuse(DataMirror.isVisible)` instead of pure
`apply()` to allow the reuse.
However, this is still a temporary workaround. In the future, we
need some clean methods to extract the simulation environment and
access these signals during simulation, possibly via XMR.
We add Macros to turn off DPIC on MemRWHelper and SimJTAG without
SYNTHESIS. That means we can reduce number of synchronizations if
needed. The new Macros is defined in palladium.mk by default.
To add build-in decleration of system task for Palladium, we need
to add $.
We declare fwrite in tb_top as Gfifo to reduce synchronizations and
speed up simulation.
Before this commit, we fetch simv_result at 5000 cycles. Only
when the result is not zero, it works and stop simulation.
Now we use DPIC function to implement the return of simv_result when not
zero, so only one DPIC function will be used to stop simulation.
That will reduce the cost of fetch and get result return faster.
Currently all macros in Gateway can be decided by GatewayConfig,
we move macros inside Config for better comprehension.
Difftest-related macros in Verilog can be configured in generated
DifftestMacros.v, which should be included by all files use those macros.
Some redundant passing of config and return val of collect is removed.
Some macros are renamed for similar format.
Makefile and mk are changed to include generated/DifftestMacros.v
We remove RANDOMIZE_GARBAGE_ASSIGN and RANDOMIZE_INVALID_ASSIGN
because it will cause many DPIC function to get $random, which slow
the simulation speed.
Move some display and delay declaration inside module, because some
module may sometimes not be used by DUT, declaration in command-line
will cause conflict.