Using Pytorch neural network as the custom “black box” likelihood function

Hi, as suggested by the topic, I am using nerual network model as the custom likelihood function. My data is from FEM simulations and a ANN neural network is trained based on the simulation data as a surrogate model. In the PYMC model, the surrogate model is wrapped in the custom likelihood function, as shown by the example of " Using a “black box” likelihood function" in the Gallery.

I think the logic is fine because the log probability is calcuated by the following code,

# create our Op
loglike_op = LogLike()
test_out = loglike_op(awid_true, bdep_true, cf_true, cr_true, eta_true, data)
print(test_out.eval())

However, I suspect that the random variables generated by pm.Uniform() cannot be fed into the neural network model due to data type mismatches.

My full code is as belows,

### define the neural network structure 
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(5, 50)
        self.fc2 = nn.Linear(50, 100)
        self.fc3 = nn.Linear(100, 50)
        self.fc4 = nn.Linear(50, 3)

    def forward(self, x):
        x = torch.relu(self.fc1(x))
        x = torch.relu(self.fc2(x))
        x = torch.relu(self.fc3(x))
        x = self.fc4(x)
        return x
mynet = Net()
## load the trained model parameters 
mynet.load_state_dict(torch.load('./trained_model.pth', weights_only=True))
mynet.eval()

def my_loglike(awid, bdep, cf, cr, eta, data):

    inputs = np.asarray([awid, bdep, cf, cr, eta], dtype=np.float32)
    inputs_tensor = torch.from_numpy(inputs).float()

    if len(inputs_tensor.shape) == 1:
        inputs_tensor = inputs_tensor.unsqueeze(0)
    
    model_output = mynet(inputs_tensor)
    model_output = model_output.detach().numpy().squeeze() 

    residuals = np.abs((data - model_output)/data)
    return np.log(residuals+1.0)

# define a pytensor Op for our likelihood function
class LogLike(Op):
    def make_node(self, awid, bdep, cf, cr, eta, data) -> Apply:
        awid = pt.as_tensor(awid)
        bdep = pt.as_tensor(bdep)
        cf = pt.as_tensor(cf)
        cr = pt.as_tensor(cr)
        eta = pt.as_tensor(eta)
        data = pt.as_tensor(data)

        inputs = [awid, bdep, cf, cr, eta, data]
        outputs = [data.type()]

        # Apply is an object that combines inputs, outputs and an Op (self)
        return Apply(self, inputs, outputs)

    def perform(self, node: Apply, inputs: list[np.ndarray], outputs: list[list[None]]) -> None:
        awid, bdep, cf, cr, eta, data = inputs  # this will contain my variables

        # call our numpy log-likelihood function
        loglike_eval = my_loglike(awid, bdep, cf, cr, eta, data)
        outputs[0][0] = np.asarray(loglike_eval)

## data 
data = np.asarray([7.82e-02, 9.96e-02, 8.69e-02])
awid_true = 0.000000
bdep_true = 0.750000
cf_true = 0.000000
cr_true = 0.750000
eta_true = 0.666250
# create our Op
loglike_op = LogLike()
test_out = loglike_op(awid_true, bdep_true, cf_true, cr_true, eta_true, data)
print(test_out.eval())

def custom_dist_loglike(data, awid, bdep, cf, cr, eta):
    # data, or observed is always passed as the first input of CustomDist
    return loglike_op(awid, bdep, cf, cr, eta, data)

# use PyMC to sampler from log-likelihood
with pm.Model() as no_grad_model:
    # uniform priors 
    awid = pm.Uniform("awid", lower=0.0, upper=1.0, initval=0.5)
    bdep = pm.Uniform("bdep", lower=0.0, upper=1.0, initval=0.5)
    cf = pm.Uniform("cf", lower=0.0, upper=1.0, initval=0.5)
    cr = pm.Uniform("cr", lower=0.0, upper=1.0, initval=0.5)
    eta = pm.Uniform("eta", lower=0.0, upper=1.0, initval=0.5)

    # use a CustomDist with a custom logp function
    likelihood = pm.CustomDist(
        "likelihood", awid, bdep, cf, cr, eta, observed=data, logp=custom_dist_loglike
    )

ip = no_grad_model.initial_point()
print(no_grad_model.compile_logp(vars=[likelihood], sum=False)(ip))

A snippet of the error from the command terminal is,

  File "c:\Users\GTJCY\Documents\AITEST\V8\pymy_ML.py", line 22, in forward
    x = torch.relu(self.fc1(x))
                   ~~~~~~~~^^^
  File "C:\Users\GTJCY\.conda\envs\pymc_env\Lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "C:\Users\GTJCY\.conda\envs\pymc_env\Lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\GTJCY\.conda\envs\pymc_env\Lib\site-packages\torch\nn\modules\linear.py", line 125, in forward
    return F.linear(input, self.weight, self.bias)
           ~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: mat1 and mat2 shapes cannot be multiplied (5x1 and 5x50)

RuntimeError: mat1 and mat2 shapes cannot be multiplied (5x1 and 5x50)
Apply node that caused the error: LogLike(Sigmoid.0, Sigmoid.0, Sigmoid.0, Sigmoid.0, Sigmoid.0, likelihood{[0.0782 0. ... 96 0.0869]})
Toposort index: 10
Inputs types: [TensorType(float64, shape=(1,)), TensorType(float64, shape=(1,)), TensorType(float64, shape=(1,)), TensorType(float64, shape=(1,)), TensorType(float64, shape=(1,)), TensorType(float64, shape=(3,))]
Inputs shapes: [(1,), (1,), (1,), (1,), (1,), (3,)]
Inputs strides: [(8,), (8,), (8,), (8,), (8,), (8,)]
Inputs values: [array([0.5]), array([0.5]), array([0.5]), array([0.5]), array([0.5]), array([0.0782, 0.0996, 0.0869])]
Outputs clients: [[output[0](likelihood_logprob)]]

Thanks.

Ok. I think I found the problem here. It is the data type difference. I notice in other posts the pytensor is suggested to use to create the neural network. But i don’t find any relavant examples. Please give a hint.

your torch function doesn’t seem to like the fact PyMC expands the scalar arguments to be 1d (shape=(1,) instead of shape=()). Try to squeeze them in the perform method before passing it to your torch function

1 Like

It works. Thank you.
BTW, why the Pytorch model is compatible here but not in other posts, e.g., How to combine pymc with neural network - #2 by jessegrabowski

The worked code is as belows,

        processed_params = [param.squeeze() if param.shape == (1,) else param 
for param in inputs]
        awid, bdep, cf, cr, eta, data = processed_params        

You wrapped your NN in an Op, so it seems totally consistent with what I suggested in the post you linked. Am I missing something?